- Home
- 2025 Annual Symposium Gallery
- Developing Large Language Model-based Pipeline for Identification of Disease Diagnosis: A Case Study on Identifying Newly Diagnosed Multiple Myeloma in Veterans Health Administration Electronic Health Records
Custom CSS
double-click to edit, do not edit in source
11/18/2025 |
8:00 AM – 9:15 AM |
Room 7
S60: House of Models: Where Diagnosis Meets Machine Learning
Presentation Type: Oral Presentations
Trustworthy and Uncertainty-Aware AI for Predicting Respiratory Complications Following Total Hip and Knee Arthroplasty
Presentation Time: 08:00 AM - 08:12 AM
Abstract Keywords: Clinical Decision Support, Machine Learning, Artificial Intelligence
Primary Track: Applications
Programmatic Theme: Clinical Research Informatics
Total hip and knee arthroplasty (THA/TKA) are among the fastest growing surgeries in the united states, where they are designed to restore mobility and improve quality of life in individuals with joint disorders. Despite their benefits, these procedures may carry significant risks, including but not limited to major respiratory complications. Prompt identification of patients at increased risk is essential for optimizing preoperative treatment, reducing adverse outcomes, and increasing patient safety. In this study, we propose an uncertainty-aware and trustworthy artificial intelligence (AI) framework to predict the likelihood of major respiratory complications, including unplanned intubation, failure to wean from ventilation, and postoperative pneumonia occurring during the index hospitalization and within 30 days following both primary and revision THA and TKA procedures. Unlike traditional risk models, our framework explicitly quantifies prediction uncertainty while maintaining high interpretability, enabling proactive and personalized clinical interventions. We assessed four ML models, including Random Forest (RF), XGBoost, Logistic Regression (LR), and Artificial Neural Networks (ANNs) to predict three postoperative respiratory outcomes. The ML models demonstrated strong predictive performance, with Random Forest achieving an F1-score of 0.87 for respiratory complications in THA, while ANNs outperformed other models in TKA, also attaining an F1-score of 0.87.
Speaker:
Ahmad P. Tafti, PhD
University of Pittsburgh
Authors:
Farnaz Rezvani, MS - Brunel University of London; Kate Towsen, BS - University of Pittsburgh; Zoe Menezes, BS - University of Pittsburgh; Anatea Einhorn, BS - New York University; Jevaughn Davis, BS - The George Washington University; Puneet Gupta, BS - The George Washington University; Johannes F. Plate, MD, PhD - University of Pittsburgh; Chloe Fox, MS - University of Pittsburgh; Nicole Myers, MS - University of Pittsburgh; Ahmad P. Tafti, PhD - University of Pittsburgh;
Presentation Time: 08:00 AM - 08:12 AM
Abstract Keywords: Clinical Decision Support, Machine Learning, Artificial Intelligence
Primary Track: Applications
Programmatic Theme: Clinical Research Informatics
Total hip and knee arthroplasty (THA/TKA) are among the fastest growing surgeries in the united states, where they are designed to restore mobility and improve quality of life in individuals with joint disorders. Despite their benefits, these procedures may carry significant risks, including but not limited to major respiratory complications. Prompt identification of patients at increased risk is essential for optimizing preoperative treatment, reducing adverse outcomes, and increasing patient safety. In this study, we propose an uncertainty-aware and trustworthy artificial intelligence (AI) framework to predict the likelihood of major respiratory complications, including unplanned intubation, failure to wean from ventilation, and postoperative pneumonia occurring during the index hospitalization and within 30 days following both primary and revision THA and TKA procedures. Unlike traditional risk models, our framework explicitly quantifies prediction uncertainty while maintaining high interpretability, enabling proactive and personalized clinical interventions. We assessed four ML models, including Random Forest (RF), XGBoost, Logistic Regression (LR), and Artificial Neural Networks (ANNs) to predict three postoperative respiratory outcomes. The ML models demonstrated strong predictive performance, with Random Forest achieving an F1-score of 0.87 for respiratory complications in THA, while ANNs outperformed other models in TKA, also attaining an F1-score of 0.87.
Speaker:
Ahmad P. Tafti, PhD
University of Pittsburgh
Authors:
Farnaz Rezvani, MS - Brunel University of London; Kate Towsen, BS - University of Pittsburgh; Zoe Menezes, BS - University of Pittsburgh; Anatea Einhorn, BS - New York University; Jevaughn Davis, BS - The George Washington University; Puneet Gupta, BS - The George Washington University; Johannes F. Plate, MD, PhD - University of Pittsburgh; Chloe Fox, MS - University of Pittsburgh; Nicole Myers, MS - University of Pittsburgh; Ahmad P. Tafti, PhD - University of Pittsburgh;
Ahmad
P. Tafti,
PhD - University of Pittsburgh
Developing Large Language Model-based Pipeline for Identification of Disease Diagnosis: A Case Study on Identifying Newly Diagnosed Multiple Myeloma in Veterans Health Administration Electronic Health Records
Presentation Time: 08:12 AM - 08:24 AM
Abstract Keywords: Large Language Models (LLMs), Natural Language Processing, Artificial Intelligence, Population Health, Information Extraction, Informatics Implementation, Cancer Prevention, Real-World Evidence Generation
Primary Track: Applications
Programmatic Theme: Clinical Informatics
Accurately identifying disease diagnoses from electronic health records (EHRs) is crucial for clinical/biomedical research; however, this is challenging when diagnoses are complex and require data from several sources, e.g., multiple myeloma (MM) and its precursor condition, MGUS. Leveraging the national Veterans Health Administrations EHRs, we developed and validated a large language model (LLM)-based pipeline that utilizes only clinical notes from randomly selected patients identified via ICD codes for MGUS/MM. Among the LLMs and learning approaches evaluated, Llama-3-8B-based pipeline with prompt engineering achieved the best performance. This pipeline outperformed rule-based or machine learning-based methods for identifying MGUS and achieved comparable performance for MM, solely relying on clinical notes. This saved the preprocessing steps and shortened the overall processing time. Our work demonstrates that the developed LLM-based pipeline can efficiently and effectively identify disease diagnoses to replace inefficient rule- or machine learning-based natural language processing methods and manual chart abstraction.
Speaker:
Mei Wang, MS
Washington University in St. Louis
Authors:
Mei Wang, MS - Washington University in St. Louis; Yuan-Hung Kuan, BS - Washington Universitty in St. Louis; Patrick Alba, MS - United States Department of Veterans Affairs; Qiwei Gan; Martin Schoen, MD, MPH - Saint Louis University School of Medicine; Theodore Thomas, MD, MPHS - Washington University in St. Louis; Jr-Shin Li, PhD - Washington University in St. Louis; Su-Hsin Chang, PhD, SM - Washington University in St. Louis;
Presentation Time: 08:12 AM - 08:24 AM
Abstract Keywords: Large Language Models (LLMs), Natural Language Processing, Artificial Intelligence, Population Health, Information Extraction, Informatics Implementation, Cancer Prevention, Real-World Evidence Generation
Primary Track: Applications
Programmatic Theme: Clinical Informatics
Accurately identifying disease diagnoses from electronic health records (EHRs) is crucial for clinical/biomedical research; however, this is challenging when diagnoses are complex and require data from several sources, e.g., multiple myeloma (MM) and its precursor condition, MGUS. Leveraging the national Veterans Health Administrations EHRs, we developed and validated a large language model (LLM)-based pipeline that utilizes only clinical notes from randomly selected patients identified via ICD codes for MGUS/MM. Among the LLMs and learning approaches evaluated, Llama-3-8B-based pipeline with prompt engineering achieved the best performance. This pipeline outperformed rule-based or machine learning-based methods for identifying MGUS and achieved comparable performance for MM, solely relying on clinical notes. This saved the preprocessing steps and shortened the overall processing time. Our work demonstrates that the developed LLM-based pipeline can efficiently and effectively identify disease diagnoses to replace inefficient rule- or machine learning-based natural language processing methods and manual chart abstraction.
Speaker:
Mei Wang, MS
Washington University in St. Louis
Authors:
Mei Wang, MS - Washington University in St. Louis; Yuan-Hung Kuan, BS - Washington Universitty in St. Louis; Patrick Alba, MS - United States Department of Veterans Affairs; Qiwei Gan; Martin Schoen, MD, MPH - Saint Louis University School of Medicine; Theodore Thomas, MD, MPHS - Washington University in St. Louis; Jr-Shin Li, PhD - Washington University in St. Louis; Su-Hsin Chang, PhD, SM - Washington University in St. Louis;
Mei
Wang,
MS - Washington University in St. Louis
Predicting Early-Onset Colorectal Cancer with Large Language Models
Presentation Time: 08:24 AM - 08:36 AM
Abstract Keywords: Machine Learning, Information Extraction, Natural Language Processing
Primary Track: Applications
Programmatic Theme: Public Health Informatics
The incidence rate of early-onset colorectal cancer (EoCRC, age < 45) has increased every year, but this population
is younger than the recommended age established by national guidelines for cancer screening. In this paper, we
applied 10 different machine learning models to predict EoCRC, and compared their performance with advanced
large language models (LLM), using patient conditions, lab results, and observations within 6 months of patient
journey prior to the CRC diagnoses. We retrospectively identified 1,953 CRC patients from multiple health systems
across the United States. The results demonstrated that the fine-tuned LLM achieved an average of 73% sensitivity
and 91% specificity.
Speaker:
Wilson Lau, PhD
Truveta
Authors:
Wilson Lau, PhD - Truveta; Youngwon Kim, PhD - Truveta; Sravanthi Parasa, MD - Swedish Medical Center; Md Enamul Haque, PhD - Truveta; Anand Oka, PhD - Truveta; Jay Nanduri, Chief technology officer - Truveta;
Presentation Time: 08:24 AM - 08:36 AM
Abstract Keywords: Machine Learning, Information Extraction, Natural Language Processing
Primary Track: Applications
Programmatic Theme: Public Health Informatics
The incidence rate of early-onset colorectal cancer (EoCRC, age < 45) has increased every year, but this population
is younger than the recommended age established by national guidelines for cancer screening. In this paper, we
applied 10 different machine learning models to predict EoCRC, and compared their performance with advanced
large language models (LLM), using patient conditions, lab results, and observations within 6 months of patient
journey prior to the CRC diagnoses. We retrospectively identified 1,953 CRC patients from multiple health systems
across the United States. The results demonstrated that the fine-tuned LLM achieved an average of 73% sensitivity
and 91% specificity.
Speaker:
Wilson Lau, PhD
Truveta
Authors:
Wilson Lau, PhD - Truveta; Youngwon Kim, PhD - Truveta; Sravanthi Parasa, MD - Swedish Medical Center; Md Enamul Haque, PhD - Truveta; Anand Oka, PhD - Truveta; Jay Nanduri, Chief technology officer - Truveta;
Wilson
Lau,
PhD - Truveta
FCFNets: A Factual and Counterfactual Learning Framework for Enhanced Hepatic Fibrosis Prediction in Young Adults with T2D
Presentation Time: 08:36 AM - 08:48 AM
Abstract Keywords: Bioinformatics, Artificial Intelligence, Data Mining, Deep Learning, Diagnostic Systems, Precision Medicine
Primary Track: Foundations
Programmatic Theme: Clinical Informatics
Hepatic fibrosis poses a significant health risk for young adults with type 2 diabetes (T2D). We propose FCFNets, a novel factual and counterfactual learning framework to predict hepatic fibrosis in young adults with T2D that can
address class imbalance issue and increase interpretability leveraging electronic health records (EHRs). We designed a hybrid UNDO oversampling strategy, combining random and dissimilar oversampling that improves dataset diversity and model robustness. FCFNets also integrates SHAP-based global and instance-level explanations, alongside feature interaction analysis, providing insights into critical risk factors associated with hepatic fibrosis. The results show our proposed model outperforms various baseline methods with high sensitivity (0.846) and accuracy(0.768), while delivering counterfactual explanations. Hyper-parameter tuning and dropout analysis further refine the model, ensuring optimal performance. This study demonstrates FCFNets‘s potential for early detection and personalized management of hepatic fibrosis, paving the way for interpretable AI applications in precision medicine.
Speaker:
Qiang Yang, PHD
University of Florida
Authors:
Qiang Yang, PHD - University of Florida; Anu Sharma, PhD - University of Florida; Daphne Calin, Bachelor - University of Florida; Chloe de Crecy, PhD - University of Florida; Rohit Inampudi, Bachelor - University of Florida; Rui Yin, PhD - University of Florida;
Presentation Time: 08:36 AM - 08:48 AM
Abstract Keywords: Bioinformatics, Artificial Intelligence, Data Mining, Deep Learning, Diagnostic Systems, Precision Medicine
Primary Track: Foundations
Programmatic Theme: Clinical Informatics
Hepatic fibrosis poses a significant health risk for young adults with type 2 diabetes (T2D). We propose FCFNets, a novel factual and counterfactual learning framework to predict hepatic fibrosis in young adults with T2D that can
address class imbalance issue and increase interpretability leveraging electronic health records (EHRs). We designed a hybrid UNDO oversampling strategy, combining random and dissimilar oversampling that improves dataset diversity and model robustness. FCFNets also integrates SHAP-based global and instance-level explanations, alongside feature interaction analysis, providing insights into critical risk factors associated with hepatic fibrosis. The results show our proposed model outperforms various baseline methods with high sensitivity (0.846) and accuracy(0.768), while delivering counterfactual explanations. Hyper-parameter tuning and dropout analysis further refine the model, ensuring optimal performance. This study demonstrates FCFNets‘s potential for early detection and personalized management of hepatic fibrosis, paving the way for interpretable AI applications in precision medicine.
Speaker:
Qiang Yang, PHD
University of Florida
Authors:
Qiang Yang, PHD - University of Florida; Anu Sharma, PhD - University of Florida; Daphne Calin, Bachelor - University of Florida; Chloe de Crecy, PhD - University of Florida; Rohit Inampudi, Bachelor - University of Florida; Rui Yin, PhD - University of Florida;
Qiang
Yang,
PHD - University of Florida
ProtoBERT-LoRA: Parameter-Efficient Prototypical Finetuning for Immunotherapy Study Identification
Presentation Time: 08:48 AM - 09:00 AM
Abstract Keywords: Artificial Intelligence, Natural Language Processing, Machine Learning, Deep Learning, Information Retrieval, Data Mining
Primary Track: Foundations
Programmatic Theme: Clinical Research Informatics
Identifying immune checkpoint inhibitor (ICI) studies in genomic repositories like Gene Expression Omnibus (GEO) is vital for cancer research yet remains challenging due to semantic ambiguity, extreme class imbalance, and limited labeled data in low-resource settings. We present ProtoBERT-LoRA, a hybrid framework that combines PubMedBERT with prototypical networks and Low-Rank Adaptation (LoRA) for efficient fine-tuning. The model enforces class- separable embeddings via episodic prototype training while preserving biomedical domain knowledge. Our dataset was divided as: Training (20 positive, 20 negative), Prototype Set (10 positive, 10 negative), Validation (20 positive, 200 negative), and Test (71 positive, 765 negative). Evaluated on test dataset, ProtoBERT-LoRA achieved F1-score of 0.624 (precision: 0.481, recall: 0.887), outperforming the rule-based system, machine learning baselines and finetuned PubMedBERT. Application to 44,287 unlabeled studies reduces manual review efforts by at least 82%. Ablation studies confirmed that combining prototypes with LoRA improved performance by 29% over stand alone LoRA.
Speaker:
Shijia Zhang, Master
Johns Hopkins University
Authors:
Shijia Zhang, Master - Johns Hopkins University; Xiyu Ding, MS - Johns Hopkins School of Medicine; Kai Ding, PhD - Takeda Pharmaceutical; Jacob Zhang, PhD - Takeda Pharmaceutical; Kevin Galinsky, PhD - Takeda Pharmaceutical; Mengrui Wang, MS - Boston University; Ryan Mayers, MS - University of Maryland; Zheyu Wang, PhD - Johns Hopkins University; Hadi Kharrazi, MD, PhD, FAMIA, FACMI - Johns Hopkins University;
Presentation Time: 08:48 AM - 09:00 AM
Abstract Keywords: Artificial Intelligence, Natural Language Processing, Machine Learning, Deep Learning, Information Retrieval, Data Mining
Primary Track: Foundations
Programmatic Theme: Clinical Research Informatics
Identifying immune checkpoint inhibitor (ICI) studies in genomic repositories like Gene Expression Omnibus (GEO) is vital for cancer research yet remains challenging due to semantic ambiguity, extreme class imbalance, and limited labeled data in low-resource settings. We present ProtoBERT-LoRA, a hybrid framework that combines PubMedBERT with prototypical networks and Low-Rank Adaptation (LoRA) for efficient fine-tuning. The model enforces class- separable embeddings via episodic prototype training while preserving biomedical domain knowledge. Our dataset was divided as: Training (20 positive, 20 negative), Prototype Set (10 positive, 10 negative), Validation (20 positive, 200 negative), and Test (71 positive, 765 negative). Evaluated on test dataset, ProtoBERT-LoRA achieved F1-score of 0.624 (precision: 0.481, recall: 0.887), outperforming the rule-based system, machine learning baselines and finetuned PubMedBERT. Application to 44,287 unlabeled studies reduces manual review efforts by at least 82%. Ablation studies confirmed that combining prototypes with LoRA improved performance by 29% over stand alone LoRA.
Speaker:
Shijia Zhang, Master
Johns Hopkins University
Authors:
Shijia Zhang, Master - Johns Hopkins University; Xiyu Ding, MS - Johns Hopkins School of Medicine; Kai Ding, PhD - Takeda Pharmaceutical; Jacob Zhang, PhD - Takeda Pharmaceutical; Kevin Galinsky, PhD - Takeda Pharmaceutical; Mengrui Wang, MS - Boston University; Ryan Mayers, MS - University of Maryland; Zheyu Wang, PhD - Johns Hopkins University; Hadi Kharrazi, MD, PhD, FAMIA, FACMI - Johns Hopkins University;
Shijia
Zhang,
Master - Johns Hopkins University
Evaluating the External Validity of LAPS-2 Acuity Measure.
Presentation Time: 09:00 AM - 09:12 AM
Abstract Keywords: Healthcare Quality, Patient Safety, Evaluation
Primary Track: Foundations
Programmatic Theme: Clinical Research Informatics
This study compared the predictive performance of the LAPS-2 score at different time points and in combination with the Charlson Comorbidity Index against the Vizient Expected Mortality probability. The findings support the integration of LAPS-2 into clinical research workflows and highlight the potential for using multiple predictors in combination to improve risk adjustment.
Speaker:
Isaac Michaels, MPH
NewYork-Presbyterian
Authors:
Gregory Hruby, PhD - Mount Sinai Health System; Jason Adelman, MD - Columbia University Irving Medical Center; Benjamin Ranard, MD, MSHP - Center for Patient Safety Science, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, USA; Department of Biomedical Informatics, Columbia University, New York, NY, USA;
Presentation Time: 09:00 AM - 09:12 AM
Abstract Keywords: Healthcare Quality, Patient Safety, Evaluation
Primary Track: Foundations
Programmatic Theme: Clinical Research Informatics
This study compared the predictive performance of the LAPS-2 score at different time points and in combination with the Charlson Comorbidity Index against the Vizient Expected Mortality probability. The findings support the integration of LAPS-2 into clinical research workflows and highlight the potential for using multiple predictors in combination to improve risk adjustment.
Speaker:
Isaac Michaels, MPH
NewYork-Presbyterian
Authors:
Gregory Hruby, PhD - Mount Sinai Health System; Jason Adelman, MD - Columbia University Irving Medical Center; Benjamin Ranard, MD, MSHP - Center for Patient Safety Science, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, USA; Department of Biomedical Informatics, Columbia University, New York, NY, USA;
Isaac
Michaels,
MPH - NewYork-Presbyterian
Developing Large Language Model-based Pipeline for Identification of Disease Diagnosis: A Case Study on Identifying Newly Diagnosed Multiple Myeloma in Veterans Health Administration Electronic Health Records
Category
Paper - Student
Description
Custom CSS
double-click to edit, do not edit in source
11/18/2025 09:15 AM (Eastern Time (US & Canada))