BEGIN:VCALENDAR
PRODID:-//github.com/ical-org/ical.net//NONSGML ical.net 4.0//EN
VERSION:2.0
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:STANDARD
DTSTART:20241103T020000
RRULE:FREQ=YEARLY;BYDAY=1SU;BYMONTH=11
TZNAME:EST
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20250309T020000
RRULE:FREQ=YEARLY;BYDAY=2SU;BYMONTH=3
TZNAME:EDT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
DESCRIPTION:Large language models have achieved near-saturated performance
  on medical knowledge benchmarks\, yet high exam scores tell us little ab
 out clinical safety or real-world utility. In this talk\, I review three 
 recent studies that collectively reframe how we should evaluate LLMs for 
 clinical use: NOHARM\, which introduces a safety-oriented benchmark revea
 ling that most LLM errors are harmful omissions rather than commissions\;
  MedR-Bench\, which decomposes clinical reasoning into stages and exposes
  critical weaknesses beyond diagnosis\; and the first randomized controll
 ed trial of ambient AI scribes\, which highlights the gap between technic
 al capability and clinical adoption. Together\, these works suggest a par
 adigm shift\, from asking "are LLMs smart enough for medicine?" to "how d
 o we rigorously evaluate their safety\, understand their failure modes\, 
 and validate their real-world impact".\n\nSpeaker:\nLingfei Qian\n\nAdmis
 sion:\nFree\n\nDetails URL:\nhttps://medicine.yale.edu/event/nlpllm-inter
 est-group-30/\n
DTEND;TZID=America/New_York:20260406T170000
DTSTAMP:20260514T210308Z
DTSTART;TZID=America/New_York:20260406T160000
LOCATION:Join our mailing list to receive Zoom Passcode: https://mailman.y
 ale.edu/mailman/listinfo/nlp-llm-ig\, URL: https://yale.zoom.us/j/9359994
 1969
SEQUENCE:0
STATUS:Confirmed
SUMMARY:NLP/LLM Interest Group
UID:91d441fa-91d7-40ed-be24-ff03c3523b04
END:VEVENT
END:VCALENDAR
