Everyone (Public)

AI Frontiers: Hosted by NLP/LLM Interest Group

Name: AI Frontiers: Hosted by NLP/LLM Interest Group
Start: 2026-06-01T20:00:00.0000000Z
End: 2026-06-01T21:00:00.0000000Z
Location: Yale University

"Learning to Reason with Multimodal Large Language Models"

101 College Street

Join our mailing list to receive Zoom Link & Passcode: https://mailman.yale.edu/mailman/listinfo/nlp-llm-ig

Add event to Calendar

"Learning to Reason with Multimodal Large Language Models" by Jingyi Zhang, PhD, postdoctoral associate in biomedical informatics & data science

Multimodal large language models (MLLMs) have demonstrated remarkable capabilities across a wide range of vision and language tasks. However, developing MLLMs with strong human-like reasoning abilities remains a key challenge, especially in complex, real-world domains such as healthcare. In this talk, I will share my recent exploration of enhancing the reasoning capabilities of MLLMs. I will first introduce our efforts to improve the general reasoning ability of MLLMs through supervised fine-tuning on high-quality multimodal chain-of-thought (CoT) data, which are searched and generated using a novel tree search algorithm across a wide range of application domains. Moving further, I will introduce our study on exploiting online reinforcement learning techniques (e.g., GRPO) that incentivize the model to actively explore alternative reasoning paths, unlocking deeper reasoning capabilities through self-improvement. Finally, I will discuss whether synthetic data is ready to address data scarcity and the high cost of data annotation in MLLMs, with a focus on developing effective data synthesis methods that can automatically generate multimodal training data to improve MLLMs’ ability to solve complex real-world tasks.