Custom CSS
double-click to edit, do not edit in source
11/18/2025 |
9:45 AM – 11:00 AM |
Room 9
S72: Beyond the Bench: AI Tools Supporting Health, Learning, and Discovery
Presentation Type: LIEAF
Predicting Underperforming Internal Medicine Residents using Machine Learning and Audit Logs: A Pilot Study
Presentation Time: 09:45 AM - 10:00 AM
Abstract Keywords: Education and Training, Workflow, Machine Learning
Primary Track: Applications
Programmatic Theme: Academic Informatics / LIEAF
Residency training is essential for medical graduates, but the advanced detection of underperformance remains challenging. This study developed a machine learning (ML) model using relative performance and audit logs from 31 PGY1 residents over six months. Ridge logistic regression performed best (AUROC = 0.9125, AUPRC = 0.8125) by the fourth month of residency. These results suggest that ML models can predict underperformance earlier, enabling timely interventions. Further research with larger datasets is needed for validation.
Speaker:
Danny
Wu,
PhDUniversity of North Carolina at Chapel Hill
Authors:
Danny Wu, PhD - University of North Carolina at Chapel Hill;
Tzu-Chun Wu, PhD - University of Cincinnati;
Scott Vennemeyer, BS - University of Cincinnati, College of Medicine;
Michelle Knopp, MD - Cincinnati Children's Hospital Medical Center;
Benjamin Kinnear,
MD, MEd -
Cincinnati Children's Hospital Medical Center;
Matt Kelleher,
MD, MEd -
Cincinnati Children's Hospital Medical Center;
Eric Warm,
MD -
University of Cincinnati;
Embedding Generative AI into Health Informatics Education and Assessing Learning Outcomes
Presentation Time: 10:00 AM - 10:15 AM
Abstract Keywords: Artificial Intelligence, Teaching Innovation, Curriculum Development, Education and Training, Legal, Ethical, Social and Regulatory Issues
Primary Track: Applications
Programmatic Theme: Academic Informatics / LIEAF
Generative artificial intelligence (Gen AI) and its application to produce text-based content called large language models (LLM) represent the latest progress in machine learning (ML), which can work with large unstructured data sets containing information in any form – discrete, text, graphics. Its real and potential implications in medicine and education are wide, from quality and safety to summarization of electronic health records to integration of the unstructured data impacts to predictive medical analytics. LLM can be utilized as tools for grading student performance as well as be embedded into course content to teach students new skills. There are debates among academics about the advantages and ethical perils of LLM, yet it is clear that health informatics programs must embrace Gen AI to ensure students are prepared to participate in a workforce that is using Gen AI. The University of Illinois at Chicago MSHI program expands upon its strengths, weaknesses, opportunities and threats (SWOT) framework for analyzing Gen AI presented at the AMIA Annual Symposium in 2023 - by reporting on our progress in the Gen AI curriculum journey and developments of the new curriculum, AI knowledge domains and competencies, and course options for students to gain job-market-ready Gen AI skills. We do so in partnership with University of San Francisco – another premier informatics program, by co-developing and co-delivering Gen AI course content. We quantitatively and qualitatively report on students’ learning outcomes and reflections following curriculum updates performed simultaneously in two distinct programs delivered in the online and hybrid formats.
Speaker:
Jacob
Krive,
PhDUniversity of Illinois at Chicago
Authors:
Freddie Seba,
MBA, MA, EdD Candidate -
University of San Francisco;
Miriam Isola, DrPH FAMIA CPHIMS - University of Illinois at Chicago;
Laura Mills, MA;
Mohan Zalake;
Jacob Krive, PhD - University of Illinois at Chicago;
Detecting Reference Errors in Scientific Literature with Large Language Models
Presentation Time: 10:15 AM - 10:30 AM
Abstract Keywords: Artificial Intelligence, Large Language Models (LLMs), Natural Language Processing
Primary Track: Applications
Programmatic Theme: Academic Informatics / LIEAF
Reference errors, such as citation and quotation errors, are common in scientific papers. Such errors can result in the propagation of inaccurate information, but are difficult and time-consuming to detect, posing a significant threat to the integrity of scientific literature. To support automatic detection of reference errors, we evaluated the ability of large language models in OpenAI’s GPT family to detect quotation errors. Specifically, we prepared an expert-annotated, general-domain dataset of statement-reference pairs from journal articles, one-third of which is in biomedicine. Large language models were evaluated in different settings with varying amounts of reference information provided by retrieval augmentation. Results showed that large language models are able to detect erroneous citations with limited context and without fine-tuning. This study contributes to the growing literature that seeks to utilize artificial intelligence to assist in the writing, reviewing, and publishing of scientific papers as well as grounding of language model responses.
Speaker:
Tianmai
Zhang,
MAUniversity of Washington
Authors:
Tianmai Zhang, MA - University of Washington;
Neil Abernethy, PhD - University of Washington;
EPPCMinerBen: A Novel Benchmark for Evaluating Large Language Models on Electronic Patient-Provider Communication in Cancer Care
Presentation Time: 10:30 AM - 10:45 AM
Abstract Keywords: Information Extraction, Large Language Models (LLMs), Data Mining
Primary Track: Applications
Programmatic Theme: Academic Informatics / LIEAF
Effective communication between patients and healthcare providers is critical, especially in cancer care. Secure messaging via patient portals provides valuable data but is challenging to analyze. We propose EPPCMinerBen, a benchmark containing two tasks: EPPC code classification and EPPC evidence extraction. This benchmark evaluates large language models (LLMs) for their ability to accurately identify communication patterns and extract meaningful insights, supporting the development of natural language processing (NLP) models that improve patient-provider interactions and personalized healthcare.
Speaker:
Yan
Wang,
PhDYale University
Authors:
Yan Wang, PhD - Yale University;
Linhai Ma,
PhD -
Yale University;
Sameer Gopali,
M.S in Computer Science -
Yale university;
Srivani Talakokkul,
MPH -
Yale University;
Samah Fodeh-Jarad, PHD - Yale University;
Bias Evaluation and Mitigation in Retrieval-Augmented Medical Question-Answering Systems
Presentation Time: 10:45 AM - 11:00 AM
Abstract Keywords: Artificial Intelligence, Bioinformatics, Diversity, Equity, Inclusion, and Accessibility, Information Retrieval
Primary Track: Applications
Programmatic Theme: Academic Informatics / LIEAF
Medical Question-Answering (QA) systems based on Retrieval-Augmented Generation (RAG) are promising for clinical decision support due to their capability to integrate external knowledge, thus reducing inaccuracies inherent in standalone large language models (LLMs). However, these systems may unintentionally propagate biases associated with sensitive demographic attributes like race, gender, and socioeconomic factors. This study systematically evaluates demographic biases within medical RAG pipelines across multiple QA benchmarks, including MedQA, MedMCQA, MMLU, and EquityMedQA. We quantify disparities in retrieval consistency and answer correctness by generating and analyzing queries sensitive to demographic variations. We further implement and compare several bias mitigation strategies—including Chain-of-Thought reasoning, Counterfactual filtering, Adversarial prompt refinement, and Majority Vote aggregation—to address identified biases. Experimental results reveal significant demographic disparities, highlighting that Majority Vote aggregation improves accuracy and fairness metrics. Our findings underscore the critical need for explicitly fairness-aware retrieval methods and prompt engineering strategies to develop truly equitable medical QA systems.
Speaker:
Yuelyu
Ji,
PhDUniversity of Pittsburgh
Authors:
Yuelyu Ji, PhD - University of Pittsburgh;
Hang Zhang, MS - University of Pittsburgh;
Yanshan Wang, PhD - University of Pittsburgh;