Custom CSS
double-click to edit, do not edit in source
11/17/2025 |
2:00 PM – 3:15 PM |
M106/M107
S37: Monitoring the Machines: Safety, Usefulness and AI Transparency in NLP
Presentation Type: Oral Presentations
Automating Adjudication of Cardiovascular Events Using Large Language Models
Presentation Time: 02:00 PM - 02:12 PM
Abstract Keywords: Natural Language Processing, Artificial Intelligence, Information Extraction
Working Group: Natural Language Processing Working Group
Primary Track: Applications
Programmatic Theme: Clinical Informatics
Cardiovascular events, such as heart attacks and strokes, remain a leading cause of mortality globally,
necessitating meticulous monitoring and adjudication in clinical trials. This process, traditionally performed
manually by clinical experts, is time-consuming, resource-intensive, and prone to inter-reviewer variability,
potentially introducing bias and hindering trial progress. This study addresses these critical limitations
by presenting a novel framework for automating the adjudication of cardiovascular events in clinical trials
using Large Language Models (LLMs). We developed a two-stage approach: first, employing an LLM-based
pipeline for event information extraction from unstructured clinical data and second, using an LLM-based
adjudication process guided by a Tree of Thoughts approach and clinical endpoint committee (CEC) guidelines.
Using cardiovascular event-specific clinical trial data, the framework achieved an F1-score of 0.82 for event
extraction and an accuracy of 0.68 for adjudication. Furthermore, we introduce the CLEART score, a
novel, automated metric specifically designed for evaluating the quality of AI-generated clinical reasoning in
adjudicating cardiovascular events. This approach demonstrates significant potential for substantially reducing
adjudication time and costs while maintaining high-quality, consistent, and auditable outcomes in clinical
trials. The reduced variability and enhanced standardization also allows for faster identification and mitigation
of risks associated with cardiovascular therapies.
Speaker:
Sonish Sivarajkumar, MS
University of Pittsburgh
Authors:
Sonish Sivarajkumar, MS - University of Pittsburgh; Kimia Ameri, Ph.D. - Eli Lilly and Company; chuqin li, ph.d. - eli lilly; Yanshan Wang, PhD - University of Pittsburgh; Min Jiang, Ph.D. - Eli Lilly and Company;
Presentation Time: 02:00 PM - 02:12 PM
Abstract Keywords: Natural Language Processing, Artificial Intelligence, Information Extraction
Working Group: Natural Language Processing Working Group
Primary Track: Applications
Programmatic Theme: Clinical Informatics
Cardiovascular events, such as heart attacks and strokes, remain a leading cause of mortality globally,
necessitating meticulous monitoring and adjudication in clinical trials. This process, traditionally performed
manually by clinical experts, is time-consuming, resource-intensive, and prone to inter-reviewer variability,
potentially introducing bias and hindering trial progress. This study addresses these critical limitations
by presenting a novel framework for automating the adjudication of cardiovascular events in clinical trials
using Large Language Models (LLMs). We developed a two-stage approach: first, employing an LLM-based
pipeline for event information extraction from unstructured clinical data and second, using an LLM-based
adjudication process guided by a Tree of Thoughts approach and clinical endpoint committee (CEC) guidelines.
Using cardiovascular event-specific clinical trial data, the framework achieved an F1-score of 0.82 for event
extraction and an accuracy of 0.68 for adjudication. Furthermore, we introduce the CLEART score, a
novel, automated metric specifically designed for evaluating the quality of AI-generated clinical reasoning in
adjudicating cardiovascular events. This approach demonstrates significant potential for substantially reducing
adjudication time and costs while maintaining high-quality, consistent, and auditable outcomes in clinical
trials. The reduced variability and enhanced standardization also allows for faster identification and mitigation
of risks associated with cardiovascular therapies.
Speaker:
Sonish Sivarajkumar, MS
University of Pittsburgh
Authors:
Sonish Sivarajkumar, MS - University of Pittsburgh; Kimia Ameri, Ph.D. - Eli Lilly and Company; chuqin li, ph.d. - eli lilly; Yanshan Wang, PhD - University of Pittsburgh; Min Jiang, Ph.D. - Eli Lilly and Company;
Sonish
Sivarajkumar,
MS - University of Pittsburgh
Using Large Language Models to Critique EHR Content and Usability
Presentation Time: 02:12 PM - 02:24 PM
Abstract Keywords: Large Language Models (LLMs), Patient Safety, Evaluation
Primary Track: Applications
Programmatic Theme: Clinical Informatics
Advancements in multimodal AI enable mLLMs to analyze screenshots and videos of EHR interactions. By using mLLMs to detect usability flaws such as counterintuitive medication order displays or excessive navigation steps, health systems can systematically uncover "usability smells"—design patterns that subtly degrade user experience and safety. Applying mLLMs to real-world EHR use cases may improve system design, enhance clinician efficiency, and reduce patient harm by addressing usability challenges in a data-driven and scalable manner.
Speaker:
Adam Wright, PhD
Vanderbilt University Medical Center
Authors:
Adam Wright, PhD - Vanderbilt University Medical Center; Laura Zahn, MS; Kimberly Garcia Flores, BA - Vanderbilt University Medical Center; Elise Russo - Vanderbilt University Medical Center; Dean Sittig, PhD - University of Texas Health Science Center at Houston;
Presentation Time: 02:12 PM - 02:24 PM
Abstract Keywords: Large Language Models (LLMs), Patient Safety, Evaluation
Primary Track: Applications
Programmatic Theme: Clinical Informatics
Advancements in multimodal AI enable mLLMs to analyze screenshots and videos of EHR interactions. By using mLLMs to detect usability flaws such as counterintuitive medication order displays or excessive navigation steps, health systems can systematically uncover "usability smells"—design patterns that subtly degrade user experience and safety. Applying mLLMs to real-world EHR use cases may improve system design, enhance clinician efficiency, and reduce patient harm by addressing usability challenges in a data-driven and scalable manner.
Speaker:
Adam Wright, PhD
Vanderbilt University Medical Center
Authors:
Adam Wright, PhD - Vanderbilt University Medical Center; Laura Zahn, MS; Kimberly Garcia Flores, BA - Vanderbilt University Medical Center; Elise Russo - Vanderbilt University Medical Center; Dean Sittig, PhD - University of Texas Health Science Center at Houston;
Adam
Wright,
PhD - Vanderbilt University Medical Center
When Helpfulness Backfires: LLMs and the Risk of Misinformation Due to Sycophantic Behavior
Presentation Time: 02:24 PM - 02:36 PM
Abstract Keywords: Large Language Models (LLMs), Artificial Intelligence, Natural Language Processing, Evaluation
Primary Track: Foundations
Programmatic Theme: Clinical Informatics
Large language models (LLMs) trained to prioritize helpfulness risk generating medical misinformation by complying with illogical requests that would provide misinformation, despite recognizing their illogical nature. Evaluating five state-of-the-art LLMs, we found initial compliance rates reached 100%. However, prompt engineering and fine-tuning substantially improved logical consistency and the ability of the LLMs to resist generating misinfotmation. Targeted interventions are essential to safely leverage LLMs in healthcare and mitigate risks associated with medical misinformation.
Speaker:
Danielle Bitterman, MD
Harvard Medical School
Authors:
Shan Chen, M.S - Havard-MGB; Mingye Gao, PhD - MIT; Kuleen Sasse, Bachelors of Science - University of Alabama at Birmingham: Department of Biomedical Informatics and Data Science; Thomas Hartvigsen, PhD - University of Virginia; Brian Anthony, PhD - MIT; Lizhou Fan, PhD - NA; Hugo Aerts, PhD - Harvard Medical School; Jack Gallifant, MBBS - Mass General Brigham; Danielle Bitterman, MD - Harvard Medical School;
Presentation Time: 02:24 PM - 02:36 PM
Abstract Keywords: Large Language Models (LLMs), Artificial Intelligence, Natural Language Processing, Evaluation
Primary Track: Foundations
Programmatic Theme: Clinical Informatics
Large language models (LLMs) trained to prioritize helpfulness risk generating medical misinformation by complying with illogical requests that would provide misinformation, despite recognizing their illogical nature. Evaluating five state-of-the-art LLMs, we found initial compliance rates reached 100%. However, prompt engineering and fine-tuning substantially improved logical consistency and the ability of the LLMs to resist generating misinfotmation. Targeted interventions are essential to safely leverage LLMs in healthcare and mitigate risks associated with medical misinformation.
Speaker:
Danielle Bitterman, MD
Harvard Medical School
Authors:
Shan Chen, M.S - Havard-MGB; Mingye Gao, PhD - MIT; Kuleen Sasse, Bachelors of Science - University of Alabama at Birmingham: Department of Biomedical Informatics and Data Science; Thomas Hartvigsen, PhD - University of Virginia; Brian Anthony, PhD - MIT; Lizhou Fan, PhD - NA; Hugo Aerts, PhD - Harvard Medical School; Jack Gallifant, MBBS - Mass General Brigham; Danielle Bitterman, MD - Harvard Medical School;
Danielle
Bitterman,
MD - Harvard Medical School
A Framework for Developing Metrics for AI Monitoring
Presentation Time: 02:36 PM - 02:48 PM
Abstract Keywords: Artificial Intelligence, Governance, Evaluation
Primary Track: Applications
Programmatic Theme: Clinical Informatics
The rapid adoption of AI in healthcare necessitates robust monitoring systems to ensure safety, effectiveness, and equity. We developed the IMPACC AI Monitoring Metrics Framework at UCSF through an iterative process involving literature reviews and stakeholder engagement. This framework builds on existing models, adding domains such as clinical and value-based outcomes, workflow impact, and healthcare professional and patient experiences to address AI monitoring needs. Prospective application has allowed identification and prioritization of AI monitoring metrics.
Speaker:
Jinoos Yazdany, MD MPH
UCSF
Authors:
Jinoos Yazdany, MD MPH - UCSF; Julia Adler-Milstein, PhD, FACMI - UCSF School of Medicine; Sarah Pollet, MPH - UCSF; Hossein Soleimani, PhD - UCSF Health; Orianna DeMasi, PhD - UCSF; Rhiannon Croci, BSN, RN - UCSF; Robert Thombley, BS - UCSF; Aris Oates, MD - UCSF Health; Maria Byron, MD - University of California, San Francisco; cynthia fenton, MD - UCSF; Sarah Beck, MDiv - UCSF; Sara Murray, MD, MAS - UCSF;
Presentation Time: 02:36 PM - 02:48 PM
Abstract Keywords: Artificial Intelligence, Governance, Evaluation
Primary Track: Applications
Programmatic Theme: Clinical Informatics
The rapid adoption of AI in healthcare necessitates robust monitoring systems to ensure safety, effectiveness, and equity. We developed the IMPACC AI Monitoring Metrics Framework at UCSF through an iterative process involving literature reviews and stakeholder engagement. This framework builds on existing models, adding domains such as clinical and value-based outcomes, workflow impact, and healthcare professional and patient experiences to address AI monitoring needs. Prospective application has allowed identification and prioritization of AI monitoring metrics.
Speaker:
Jinoos Yazdany, MD MPH
UCSF
Authors:
Jinoos Yazdany, MD MPH - UCSF; Julia Adler-Milstein, PhD, FACMI - UCSF School of Medicine; Sarah Pollet, MPH - UCSF; Hossein Soleimani, PhD - UCSF Health; Orianna DeMasi, PhD - UCSF; Rhiannon Croci, BSN, RN - UCSF; Robert Thombley, BS - UCSF; Aris Oates, MD - UCSF Health; Maria Byron, MD - University of California, San Francisco; cynthia fenton, MD - UCSF; Sarah Beck, MDiv - UCSF; Sara Murray, MD, MAS - UCSF;
Jinoos
Yazdany,
MD MPH - UCSF
Survivorship Navigator: Personalized Survivorship Care Plan Generation using Large Language Models
Presentation Time: 02:48 PM - 03:00 PM
Abstract Keywords: Large Language Models (LLMs), Chronic Care Management, Clinical Decision Support, Natural Language Processing, Cancer Prevention, Information Extraction, Information Retrieval, Evaluation
Primary Track: Applications
Programmatic Theme: Clinical Informatics
Cancer survivorship care plans (SCPs) are critical tools for guiding long-term follow-up care of cancer survivors. Yet, their widespread adoption remains hindered by the significant clinician burden and the time- and labor-intensive process of SCP creation. Current practices require clinicians to extract and synthesize treatment summaries from complex patient data, apply relevant survivorship guidelines, and generate a care plan with personalized recommendations, making SCP generation time-consuming. In this study, we systematically explore the potential of large language models (LLMs) for automating SCP generation and introduce Survivorship Navigator, a framework designed to streamline SCP creation and enhance integration with clinical systems. We evaluate our approach through automated assessments and a human expert study, demonstrating that Survivorship Navigator outperforms baseline methods, producing SCPs that are more accurate, guideline-compliant, and actionable.
Speaker:
Jathurshan Pradeepkumar, BS
University of Illinois, Urbana-Champaign
Authors:
Jathurshan Pradeepkumar, BS - University of Illinois, Urbana-Champaign; Shivam Pankaj Kumar, MS - University of Illinois Urbana-Champaign; Courtney Bryce Reamer, MD - Northwestern University; Marie Dreyer, MD - Northwestern University; Jyoti Patel, MD - Northwestern University; David Liebovitz, MD - Northwestern University Feinberg School of Medicine; Jimeng Sun - University of Illinois at Urbana Champaign;
Presentation Time: 02:48 PM - 03:00 PM
Abstract Keywords: Large Language Models (LLMs), Chronic Care Management, Clinical Decision Support, Natural Language Processing, Cancer Prevention, Information Extraction, Information Retrieval, Evaluation
Primary Track: Applications
Programmatic Theme: Clinical Informatics
Cancer survivorship care plans (SCPs) are critical tools for guiding long-term follow-up care of cancer survivors. Yet, their widespread adoption remains hindered by the significant clinician burden and the time- and labor-intensive process of SCP creation. Current practices require clinicians to extract and synthesize treatment summaries from complex patient data, apply relevant survivorship guidelines, and generate a care plan with personalized recommendations, making SCP generation time-consuming. In this study, we systematically explore the potential of large language models (LLMs) for automating SCP generation and introduce Survivorship Navigator, a framework designed to streamline SCP creation and enhance integration with clinical systems. We evaluate our approach through automated assessments and a human expert study, demonstrating that Survivorship Navigator outperforms baseline methods, producing SCPs that are more accurate, guideline-compliant, and actionable.
Speaker:
Jathurshan Pradeepkumar, BS
University of Illinois, Urbana-Champaign
Authors:
Jathurshan Pradeepkumar, BS - University of Illinois, Urbana-Champaign; Shivam Pankaj Kumar, MS - University of Illinois Urbana-Champaign; Courtney Bryce Reamer, MD - Northwestern University; Marie Dreyer, MD - Northwestern University; Jyoti Patel, MD - Northwestern University; David Liebovitz, MD - Northwestern University Feinberg School of Medicine; Jimeng Sun - University of Illinois at Urbana Champaign;
Jathurshan
Pradeepkumar,
BS - University of Illinois, Urbana-Champaign
Improving electronic health record processing of large language models via retrieval-augmented generation: A case study on dietary supplements
Presentation Time: 03:00 PM - 03:12 PM
Abstract Keywords: Information Retrieval, Large Language Models (LLMs), Natural Language Processing, Artificial Intelligence
Primary Track: Applications
Large language models (LLMs) excel in natural language processing (NLP) but struggle with domain-specific complexities in electronic health records (EHRs). We demonstrate that retrieval-augmented generation (RAG) enhances LLMs for dietary supplement (DS) information extraction. By testing models like Llama-3 with diverse retrievers on tasks including entity recognition and usage classification, task-aligned retrieval outperforms reliance on model size or specialization. Smaller general models paired with optimized retrievers match or exceed specialized counterparts-structured retrieval aids complex tasks (e.g., triple extraction), while semantic retrieval improves classification. Results challenge assumptions that larger or domain-specific models are superior, emphasizing dynamic knowledge integration over brute-force scaling. This approach offers practical strategies for clinical NLP, enabling efficient EHR analysis without massive resources. Prioritizing retrieval strategies over model size advances tools for evidence-based healthcare, highlighting adaptability and cost-effectiveness in real-world medical applications.
Speaker:
Zaifu Zhan, MS
University of Minnesota twin cities
Authors:
Zaifu Zhan, MS - University of Minnesota twin cities; Shuang Zhou, PhD - University of Minnesota Twin Cities; Jiawen Deng, High school - University of Minnesota; Rui Zhang, PhD, FAMIA, FACMI - University of Minnesota, Twin Cities;
Presentation Time: 03:00 PM - 03:12 PM
Abstract Keywords: Information Retrieval, Large Language Models (LLMs), Natural Language Processing, Artificial Intelligence
Primary Track: Applications
Large language models (LLMs) excel in natural language processing (NLP) but struggle with domain-specific complexities in electronic health records (EHRs). We demonstrate that retrieval-augmented generation (RAG) enhances LLMs for dietary supplement (DS) information extraction. By testing models like Llama-3 with diverse retrievers on tasks including entity recognition and usage classification, task-aligned retrieval outperforms reliance on model size or specialization. Smaller general models paired with optimized retrievers match or exceed specialized counterparts-structured retrieval aids complex tasks (e.g., triple extraction), while semantic retrieval improves classification. Results challenge assumptions that larger or domain-specific models are superior, emphasizing dynamic knowledge integration over brute-force scaling. This approach offers practical strategies for clinical NLP, enabling efficient EHR analysis without massive resources. Prioritizing retrieval strategies over model size advances tools for evidence-based healthcare, highlighting adaptability and cost-effectiveness in real-world medical applications.
Speaker:
Zaifu Zhan, MS
University of Minnesota twin cities
Authors:
Zaifu Zhan, MS - University of Minnesota twin cities; Shuang Zhou, PhD - University of Minnesota Twin Cities; Jiawen Deng, High school - University of Minnesota; Rui Zhang, PhD, FAMIA, FACMI - University of Minnesota, Twin Cities;
Zaifu
Zhan,
MS - University of Minnesota twin cities
A Framework for Developing Metrics for AI Monitoring
Category
Podium Abstract
Description
Custom CSS
double-click to edit, do not edit in source
11/17/2025 03:15 PM (Eastern Time (US & Canada))