Skip to Main Content
Everyone (Public)

AI Frontiers: Hosted by NLP/LLM Interest Group

"Learning to Reason with Multimodal Large Language Models"

"Learning to Reason with Multimodal Large Language Models" by Jingyi Zhang, PhD, postdoctoral associate in biomedical informatics & data science

Multimodal large language models (MLLMs) have demonstrated remarkable capabilities across a wide range of vision and language tasks. However, developing MLLMs with strong human-like reasoning abilities remains a key challenge, especially in complex, real-world domains such as healthcare. In this talk, I will share my recent exploration of enhancing the reasoning capabilities of MLLMs. I will first introduce our efforts to improve the general reasoning ability of MLLMs through supervised fine-tuning on high-quality multimodal chain-of-thought (CoT) data, which are searched and generated using a novel tree search algorithm across a wide range of application domains. Moving further, I will introduce our study on exploiting online reinforcement learning techniques (e.g., GRPO) that incentivize the model to actively explore alternative reasoning paths, unlocking deeper reasoning capabilities through self-improvement. Finally, I will discuss whether synthetic data is ready to address data scarcity and the high cost of data annotation in MLLMs, with a focus on developing effective data synthesis methods that can automatically generate multimodal training data to improve MLLMs’ ability to solve complex real-world tasks.

Speaker

Contacts

Host Organization

Admission

Free

Event Type

Lectures and Seminars

Food

Snacks
Jun 20261Monday