Custom CSS
double-click to edit, do not edit in source
11/18/2025 |
2:00 PM – 3:15 PM |
Room 7
S80: Divide, Conquer, Combine: The Power of Multi-Hop, Multi-Agent, Multi-Task AI in Medicine
Presentation Type: Oral Presentations
MoMA: A Mixture-of-Multimodal-Agents Architecture for Enhancing Clinical Prediction Modelling
Presentation Time: 02:00 PM - 02:12 PM
Abstract Keywords: Artificial Intelligence, Large Language Models (LLMs), Machine Learning
Primary Track: Foundations
Programmatic Theme: Clinical Research Informatics
We propose Mixture-of-Multimodal-Agents (MoMA), a novel architecture using large language model (LLM) agents to integrate multimodal EHR data for clinical prediction. MoMA converts non-text data (e.g., images, labs) into textual summaries, aggregates them with clinical notes, and produces predictions. Evaluated on two real-world tasks, MoMA outperforms state-of-the-art methods, demonstrating improved accuracy, robustness, and generalizability in clinical prediction from heterogeneous data sources.
Speaker:
Jifan
Gao,
MS
University of Wisconsin-Madison
Authors:
Jifan Gao, MS - University of Wisconsin-Madison;
Mahmudur Rahman, PhD - University of Wisconsin-Madison;
Madeline Oguss, MS - University of Wisconsin - Madison;
Ann O’Rourke,
MD, MPH -
University of Wisconsin-Madison;
Randy Brown,
MD, PHD -
University of Wisconsin-Madison;
Anne Stey, MD, MSc - Northwestern University;
Anoop Mayampurath, PhD - University of Wisconsin - Madison;
Matthew Churpek, MD, MPH, PhD - University of Wisconsin-Madison;
Guanhua Chen,
PhD -
University of Wisconsin-Madison;
Majid Afshar, MD, MSCR - University of Wisconsin - Madison;
Jifan
Gao,
MS - University of Wisconsin-Madison
Multi-Agent Framework for Automated Validation of Reporting Checklist Compliance in Observational Studies
Presentation Time: 02:12 PM - 02:24 PM
Abstract Keywords: Information Retrieval, Large Language Models (LLMs), Real-World Evidence Generation, Environmental Health and Climate Informatics
Primary Track: Applications
Programmatic Theme: Clinical Research Informatics
We developed a multi-agent framework using complementary large language models to automate validation of observational studies against reporting guidelines. Our system integrates a reasoner, extractor, validator, and cross-checking mechanism across different LLM engines ( gpt-4o, claude-3.5) with retrieval-augmented generation. Testing on 30 papers showed high agreement rates (88-92%) between configurations and drastically reduced assessment time (95.8% reduction). This approach addresses the time burden barrier to guideline compliance while maintaining high accuracy, potentially improving research reporting quality.
Speaker:
Chenyu
Li,
M.S.
University of Pittsburgh
Authors:
Chenyu Li, M.S. - University of Pittsburgh;
Seohu Lee, MS - Johns Hopkins University;
Yanshan Wang, PhD - University of Pittsburgh;
Michael Becich, MD PhD - U Pitt School of Medicine;
Harold Lehmann, MD, PhD - Johns Hopkins University;
Chenyu
Li,
M.S. - University of Pittsburgh
MedHopQA: a disease-centered dataset for benchmarking multi-hop reasoning in biomedical question answering
Presentation Time: 02:24 PM - 02:36 PM
Abstract Keywords: Large Language Models (LLMs), Evaluation, Natural Language Processing, Machine Learning, Information Retrieval
Primary Track: Foundations
Programmatic Theme: Clinical Research Informatics
Large language models (LLMs) are increasingly used in biomedical and healthcare applications, where accurate question answering (QA) is critical. MedHopQA is a new challenging benchmark consisting of 1,000 question-answer pairs, focused on diseases, genes and chemicals. The dataset is intentionally derived from information publicly available that LLMs likely encountered during training and questions are constructed with the intent to require a multistep reasoning process to produce an answer.
Speaker:
Rezarta
Islamaj,
Ph.D.
National Institutes of Health, National Library of Medicine
Authors:
Nicholas Wan, Bachelor of Engineering - National Institutes of Health;
Robert Leaman - NCBI/NLM/NIH;
Guangzhi Xiong, BA - University of Virginia;
Qiao Jin, M.D. - National Institutes of Health;
Natalie Xie,
NA -
National Institutes of Health;
W. John Wilbur, MD - Computer Craft Corporation;
Shubo Tian,
Ph.D. -
National Library of Medicine;
Lana Yeganova, Dr / PhD - NIH;
Po-Ting Lai;
Chih-Hsuan Wei - NCBI;
Yifan Yang, B.S. - NCBI, NLM/NIH;
Joey Chan, M.S. - National Library of Medicine;
Yao Ge, Master - Emory University;
Qingqing Zhu, PHD - National Institutes of Health;
Zhizheng Wang, Ph.D - National Institutes of Health;
Zhiyong Lu, PhD - National Library of Medicine, NIH;
Rezarta
Islamaj,
Ph.D. - National Institutes of Health, National Library of Medicine
MedFactEval: A Fact-Grounded, Scalable Approach to Evaluating AI-Generated Discharge Summaries
Presentation Time: 02:36 PM - 02:48 PM
Abstract Keywords: Large Language Models (LLMs), Evaluation, Artificial Intelligence
Primary Track: Foundations
MedFactEval offers a scalable, fact-grounded method to evaluate AI-generated discharge summaries. Clinicians define key facts expected in summaries, which are then assessed by both human experts and LLM judges. Results reveal high agreement on factual content, with GPT-4o achieving reliability comparable to a single physician evaluator. This approach streamlines evaluation, balancing human oversight with automated scalability, and lays the groundwork for advancing AI-driven clinical documentation.
Speaker:
François
Grolleau,
MD, PhD
Stanford Center for Biomedical Informatics Research
Authors:
François Grolleau, MD, PhD - Stanford Center for Biomedical Informatics Research;
Emily Alsentzer, MS, PhD - Brigham and Women's Hospital;
Akshay Swaminathan,
BA -
Department of Biomedical Data Science, Stanford University;
Philip Chung, MD, MS - Stanford University;
Asad Aali, MS - Stanford University;
Jason Hom,
MD -
Department of Medicine, Stanford University;
April Liang, MD - Stanford University;
Kameron Black, DO, MPH - Stanford University;
Fateme Nateghi Haredasht, PhD - Stanford University;
Nigam Shah, MBBS - Stanford University;
Kevin Schulman;
Jonathan Chen, MD, PhD - Stanford University Hospital;
François
Grolleau,
MD, PhD - Stanford Center for Biomedical Informatics Research
A Multi-task Model for Simultaneous Prediction of Multiple Deterioration Events Among Hospitalized Children
Presentation Time: 02:48 PM - 03:00 PM
Abstract Keywords: Clinical Decision Support, Deep Learning, Critical Care, Pediatrics, Machine Learning
Primary Track: Applications
Programmatic Theme: Clinical Informatics
Hospitalized children who experience clinical deterioration have increased risks of morbidity and mortality. To facilitate early detection of deterioration, we developed and externally validated a multi-task deep learning model that simultaneously predicts patient risk for mechanical ventilation, vasoactive infusion, high-flow nasal cannula, and mortality. The addition of auxiliary vital sign tasks to the multi-task model notably improved its performance, achieving higher AUC values over similarly constructed models that predict a single or composite deterioration outcome.
Speaker:
Sierra
Strutz,
PhD Student in Biomedical Data Science
University of Wisconsin - Madison
Authors:
Sierra Strutz, PhD Student in Biomedical Data Science - University of Wisconsin - Madison;
Kyle Carey,
MPH -
University of Chicago;
Harsh Sahu,
MS -
University of Wisconsin-Madison;
Priti Jani,
MD, MPH -
University of Chicago;
Emily Gilbert,
MD -
Loyola University;
Julie Fitzgerald,
MD -
Loyola University;
Nicholas Kuehnel,
MD -
University of Wisconsin-Madison;
Neil Munjal, MD - University of Wisconsin;
Majid Afshar, MD, MSCR - University of Wisconsin - Madison;
Matthew Churpek, MD, MPH, PhD - University of Wisconsin-Madison;
Anoop Mayampurath, PhD - University of Wisconsin - Madison;
Sierra
Strutz,
PhD Student in Biomedical Data Science - University of Wisconsin - Madison
E Pluribus Unum: Statistical Methods for Combining Multiple Indicators of Capacity and Resource Utilization Into A Single Indicator of System Stress
Presentation Time: 03:00 PM - 03:12 PM
Abstract Keywords: Public Health, Patient Safety, Healthcare Quality
Primary Track: Applications
Programmatic Theme: Public Health Informatics
Large medical centers face challenges in responding to capacity demands when evaluating isolated metrics that represent various aspects of resource utilization. To overcome this, at the Children’s Hospital of Philadelphia, we aggregated nine indicators into a single percentile-based metric. This study presents a statistical methodology for synthesizing nine different indicators into a unified stress score. This score, which aligns with data on external stressors, is now being used to proactively respond to fluctuating demand.
Speaker:
Jen
Lea Goodwin,
MPH
Children's Hospital of Philadelphia
Authors:
Daniel Kabat,
MS -
Children's Hospital of Philadelphia;
Abdul Tariq, PhD - Children's Hospital of Philadelphia;
Max Hans, BS - Children's Hospital Of Philadelphia;
Yael Greenberg,
MPH -
Children's Hospital of Philadelphia;
Emily Kane,
MD -
Children's Hospital of Philadelphia;
Jen
Lea Goodwin,
MPH - Children's Hospital of Philadelphia