3/12/2025 |
1:30 PM – 3:00 PM |
Monongahela
S26: Generative AI Application and Evaluation
Presentation Type: Podium Abstract
Improving Adverse Event Signal Detection using a Generative Model Incorporating Semantic Embedding
Presentation Time: 01:30 PM - 01:45 PM
Abstract Keywords: Drug Discovery, Repurposing, and Side-effect Discovery, Machine Learning, Generative AI, and Predictive Modeling, EHR-based Phenotyping
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Harnessing the Power of Large Language Models in Health Data Science
Adverse drug reactions (ADRs) significantly impact patient safety and healthcare costs. Traditional ADR detection methods face challenges due to biased reporting systems like FAERS and often overlook numeric data such as lab values and drug doses in electronic health records (EHRs). This study investigates the use of transformer-based models, specifically GPT-2, to enhance ADR signal detection using EHR data. We utilized two EHR datasets, MIMIC-IV and the University of Washington EHR, and evaluated models with and without numeric data integration through value-aware and graded embeddings. Our findings demonstrate that transformer-based models consistently outperform traditional disproportionality metrics like PRR and TreeScan. Incorporating numeric clinical data significantly improves model performance, with value-aware transformers yielding the highest accuracy in detecting ADRs. These results highlight the potential of advanced machine learning models in pharmacovigilance efforts. Future research will focus on exploring polypharmacy effects and validating the generalizability of these models across diverse healthcare systems.
Speaker(s):
Yifan Wu, MPH
University of Washington, Biomedical & Health Informatics
Author(s):
Trevor Cohen, MBChB, PhD - Biomedical Informatics and Medical Education, University of Washington; Ian de Boer, MD, MS - Nephrology, University of Washington; Yifan Wu, MPH - University of Washington, Biomedical & Health Informatics;
Presentation Time: 01:30 PM - 01:45 PM
Abstract Keywords: Drug Discovery, Repurposing, and Side-effect Discovery, Machine Learning, Generative AI, and Predictive Modeling, EHR-based Phenotyping
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Harnessing the Power of Large Language Models in Health Data Science
Adverse drug reactions (ADRs) significantly impact patient safety and healthcare costs. Traditional ADR detection methods face challenges due to biased reporting systems like FAERS and often overlook numeric data such as lab values and drug doses in electronic health records (EHRs). This study investigates the use of transformer-based models, specifically GPT-2, to enhance ADR signal detection using EHR data. We utilized two EHR datasets, MIMIC-IV and the University of Washington EHR, and evaluated models with and without numeric data integration through value-aware and graded embeddings. Our findings demonstrate that transformer-based models consistently outperform traditional disproportionality metrics like PRR and TreeScan. Incorporating numeric clinical data significantly improves model performance, with value-aware transformers yielding the highest accuracy in detecting ADRs. These results highlight the potential of advanced machine learning models in pharmacovigilance efforts. Future research will focus on exploring polypharmacy effects and validating the generalizability of these models across diverse healthcare systems.
Speaker(s):
Yifan Wu, MPH
University of Washington, Biomedical & Health Informatics
Author(s):
Trevor Cohen, MBChB, PhD - Biomedical Informatics and Medical Education, University of Washington; Ian de Boer, MD, MS - Nephrology, University of Washington; Yifan Wu, MPH - University of Washington, Biomedical & Health Informatics;
Generative AI Is Not Ready for Clinical Use in Patient Education for Lower Back Pain Patients, Even With Retrieval-Augmented Generation
Presentation Time: 01:45 PM - 02:00 PM
Abstract Keywords: Natural Language Processing, Patient-centered Research and Care, Health Literacy Issues and Solutions, Machine Learning, Generative AI, and Predictive Modeling
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Harnessing the Power of Large Language Models in Health Data Science
Low back pain (LBP) is a leading cause of disability globally. Following the onset of LBP and subsequent treatment, adequate patient education is crucial for improving functionality and long-term outcomes. Despite advancements in patient education strategies, significant gaps persist in delivering personalized, evidence-based information to patients with LBP. Recent advancements in large language models (LLMs) and generative artificial intelligence (GenAI) have demonstrated the potential to enhance patient education. However, their application and efficacy in delivering educational content to patients with LBP remain underexplored and warrant further investigation. In this study, we introduce a novel approach utilizing LLMs with Retrieval-Augmented Generation (RAG) and few-shot learning to generate tailored educational materials for patients with LBP. Physical therapists manually evaluated our model responses for redundancy, accuracy, and completeness using a Likert scale. In addition, the readability of the generated education materials is assessed using the Flesch Reading Ease score. The findings demonstrate that RAG-based LLMs outperform traditional LLMs, providing more accurate, complete, and readable patient education materials with less redundancy. Having said that, our analysis reveals that the generated materials are not yet ready for use in clinical practice. This study underscores the potential of AI-driven models utilizing RAG to improve patient education for LBP; however, significant challenges remain in ensuring the clinical relevance and granularity of content generated by these models.
Speaker(s):
Author(s):
Yi-fei Zhao, BS - University of Pittsburgh; Allyn Bove, PhD, DPT - University of Pittsburgh; David Thompson, DPT - University of Pittsburgh; James Hill, DPT - University of Pittsburgh; Yi Xu, BS - University of Pittsburgh; Yufan Ren, BS - University of Pittsburgh; Andrea Hassman, BS - University of Pittsburgh; Leming Zhou, PhD - University of Pittsburgh; Yanshan Wang, PhD - University of Pittsburgh;
Presentation Time: 01:45 PM - 02:00 PM
Abstract Keywords: Natural Language Processing, Patient-centered Research and Care, Health Literacy Issues and Solutions, Machine Learning, Generative AI, and Predictive Modeling
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Harnessing the Power of Large Language Models in Health Data Science
Low back pain (LBP) is a leading cause of disability globally. Following the onset of LBP and subsequent treatment, adequate patient education is crucial for improving functionality and long-term outcomes. Despite advancements in patient education strategies, significant gaps persist in delivering personalized, evidence-based information to patients with LBP. Recent advancements in large language models (LLMs) and generative artificial intelligence (GenAI) have demonstrated the potential to enhance patient education. However, their application and efficacy in delivering educational content to patients with LBP remain underexplored and warrant further investigation. In this study, we introduce a novel approach utilizing LLMs with Retrieval-Augmented Generation (RAG) and few-shot learning to generate tailored educational materials for patients with LBP. Physical therapists manually evaluated our model responses for redundancy, accuracy, and completeness using a Likert scale. In addition, the readability of the generated education materials is assessed using the Flesch Reading Ease score. The findings demonstrate that RAG-based LLMs outperform traditional LLMs, providing more accurate, complete, and readable patient education materials with less redundancy. Having said that, our analysis reveals that the generated materials are not yet ready for use in clinical practice. This study underscores the potential of AI-driven models utilizing RAG to improve patient education for LBP; however, significant challenges remain in ensuring the clinical relevance and granularity of content generated by these models.
Speaker(s):
Author(s):
Yi-fei Zhao, BS - University of Pittsburgh; Allyn Bove, PhD, DPT - University of Pittsburgh; David Thompson, DPT - University of Pittsburgh; James Hill, DPT - University of Pittsburgh; Yi Xu, BS - University of Pittsburgh; Yufan Ren, BS - University of Pittsburgh; Andrea Hassman, BS - University of Pittsburgh; Leming Zhou, PhD - University of Pittsburgh; Yanshan Wang, PhD - University of Pittsburgh;
Evaluating Generative AI’s Ability to Identify Lung and Kidney Cancer Subtypes in Publicly Available Structured Genetic Datasets
Presentation Time: 02:00 PM - 02:15 PM
Abstract Keywords: Machine Learning, Generative AI, and Predictive Modeling, Genomics/Omic Data Interpretation, Informatics Research/Biomedical Informatics Research Methods
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Proactive Machine Learning in Biomedical Applications: The Power of Generative AI and Reinforcement Learning
Recent advancements in AI have introduced large language models (LLMs). This study examines GPT-4's ability to predict lung and kidney cancer subtypes and compares its performance to traditional machine learning (ML) models. In lung cancer experiments, GPT Fixed Temp=1 achieved the best results, while traditional models outperformed LLMs in kidney cancer experiments. The study underscores the potential of LLMs for genetic analysis, though further development is needed to reach ML standards.
Speaker(s):
Ethan Hillis, MS
Institute for Informatics at Washington University School of Medicine in St. Louis
Author(s):
Kriti Bhattarai, PhD Candidate - Institute for Informatics at Washington University; Zachary Abrams, PhD - Institute for Informatics at Washington University School of Medicine in St. Louis;
Presentation Time: 02:00 PM - 02:15 PM
Abstract Keywords: Machine Learning, Generative AI, and Predictive Modeling, Genomics/Omic Data Interpretation, Informatics Research/Biomedical Informatics Research Methods
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Proactive Machine Learning in Biomedical Applications: The Power of Generative AI and Reinforcement Learning
Recent advancements in AI have introduced large language models (LLMs). This study examines GPT-4's ability to predict lung and kidney cancer subtypes and compares its performance to traditional machine learning (ML) models. In lung cancer experiments, GPT Fixed Temp=1 achieved the best results, while traditional models outperformed LLMs in kidney cancer experiments. The study underscores the potential of LLMs for genetic analysis, though further development is needed to reach ML standards.
Speaker(s):
Ethan Hillis, MS
Institute for Informatics at Washington University School of Medicine in St. Louis
Author(s):
Kriti Bhattarai, PhD Candidate - Institute for Informatics at Washington University; Zachary Abrams, PhD - Institute for Informatics at Washington University School of Medicine in St. Louis;
Deconstructing Complex Diagnostic Criteria and Leveraging Generative Artificial Intelligence to Facilitate Multiple Sclerosis Diagnosis
Presentation Time: 02:15 PM - 02:30 PM
Abstract Keywords: Machine Learning, Generative AI, and Predictive Modeling, Data Mining and Knowledge Discovery, Knowledge Representation, Management, or Engineering, Secondary Use of EHR Data, Patient-centered Research and Care
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Harnessing the Power of Large Language Models in Health Data Science
Multiple sclerosis (MS) is challenging to diagnose due to complex diagnostic criteria, leading to delayed or missed diagnosis. Using GPT-4 to augment a decision tree (based on the MS diagnostic criteria), our approach correctly determined the MS diagnosis status from the first neurology note (without seeing the neurologist’s assessment) in 74% of the 125 patients, but hallucination (incoherence, overreliance) exists. Generative AI can potentially facilitate complex disease diagnosis but requires rigorous validation of computable algorithms.
Speaker(s):
Shruthi Venkatesh, BS
University of Pittsburgh
Author(s):
Marisa DelSignore, BA - University of Pittsburgh School of Medicine; Xizhi Wu, Master of Science - University of Pittsburgh; Michele Morris, BA - University of Pittsburgh; Wesley Kerr, MD, PhD - University of Pittsburgh; Shyam Visweswaran, MD PhD - University of Pittsburgh; Yanshan Wang, PhD - University of Pittsburgh; Zongqi Xia, MD, PhD - University of Pittsburgh School of Medicine;
Presentation Time: 02:15 PM - 02:30 PM
Abstract Keywords: Machine Learning, Generative AI, and Predictive Modeling, Data Mining and Knowledge Discovery, Knowledge Representation, Management, or Engineering, Secondary Use of EHR Data, Patient-centered Research and Care
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Harnessing the Power of Large Language Models in Health Data Science
Multiple sclerosis (MS) is challenging to diagnose due to complex diagnostic criteria, leading to delayed or missed diagnosis. Using GPT-4 to augment a decision tree (based on the MS diagnostic criteria), our approach correctly determined the MS diagnosis status from the first neurology note (without seeing the neurologist’s assessment) in 74% of the 125 patients, but hallucination (incoherence, overreliance) exists. Generative AI can potentially facilitate complex disease diagnosis but requires rigorous validation of computable algorithms.
Speaker(s):
Shruthi Venkatesh, BS
University of Pittsburgh
Author(s):
Marisa DelSignore, BA - University of Pittsburgh School of Medicine; Xizhi Wu, Master of Science - University of Pittsburgh; Michele Morris, BA - University of Pittsburgh; Wesley Kerr, MD, PhD - University of Pittsburgh; Shyam Visweswaran, MD PhD - University of Pittsburgh; Yanshan Wang, PhD - University of Pittsburgh; Zongqi Xia, MD, PhD - University of Pittsburgh School of Medicine;
Automating and Evaluating Large Language Models for Accurate Text Summarization Under Zero-Shot Conditions
Presentation Time: 02:30 PM - 02:45 PM
Abstract Keywords: Reproducible Research Methods and Tools, Data Mining and Knowledge Discovery, Clinical and Research Data Collection, Curation, Preservation, or Sharing
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Harnessing the Power of Large Language Models in Health Data Science
Automated text summarization (ATS) is crucial for collecting specialized, domain-specific information. Zero-shot learning (ZSL) allows large language models (LLMs) to respond to prompts on information not included in their training, playing a vital role in this process. This study evaluates LLMs' effectiveness in generating accurate summaries under ZSL conditions and explores using retrieval augmented generation (RAG) and prompt engineering to enhance factual accuracy and understanding. We combined LLMs with summarization modeling, prompt engineering, and RAG, evaluating the summaries using the METEOR metric and keyword frequencies through word clouds. Results indicate that LLMs are generally well-suited for ATS tasks, demonstrating an ability to handle specialized information under ZSL conditions with RAG. However, web scraping limitations hinder a single generalized retrieval mechanism. While LLMs show promise for ATS under ZSL conditions with RAG, challenges like goal misgeneralization and web scraping limitations need addressing. Future research should focus on solutions to these issues.
Speaker(s):
Hilda Klasky, FAMIA
Oak Ridge National Laboratory (ORNL) - UT Battelle
Author(s):
Presentation Time: 02:30 PM - 02:45 PM
Abstract Keywords: Reproducible Research Methods and Tools, Data Mining and Knowledge Discovery, Clinical and Research Data Collection, Curation, Preservation, or Sharing
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Harnessing the Power of Large Language Models in Health Data Science
Automated text summarization (ATS) is crucial for collecting specialized, domain-specific information. Zero-shot learning (ZSL) allows large language models (LLMs) to respond to prompts on information not included in their training, playing a vital role in this process. This study evaluates LLMs' effectiveness in generating accurate summaries under ZSL conditions and explores using retrieval augmented generation (RAG) and prompt engineering to enhance factual accuracy and understanding. We combined LLMs with summarization modeling, prompt engineering, and RAG, evaluating the summaries using the METEOR metric and keyword frequencies through word clouds. Results indicate that LLMs are generally well-suited for ATS tasks, demonstrating an ability to handle specialized information under ZSL conditions with RAG. However, web scraping limitations hinder a single generalized retrieval mechanism. While LLMs show promise for ATS under ZSL conditions with RAG, challenges like goal misgeneralization and web scraping limitations need addressing. Future research should focus on solutions to these issues.
Speaker(s):
Hilda Klasky, FAMIA
Oak Ridge National Laboratory (ORNL) - UT Battelle
Author(s):
Automatic Summarization of Doctor-Patient Encounter Dialogues Using Large Language Model through Prompt Tuning
Presentation Time: 02:45 PM - 03:00 PM
Abstract Keywords: Machine Learning, Generative AI, and Predictive Modeling, Natural Language Processing, Informatics Research/Biomedical Informatics Research Methods
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Proactive Machine Learning in Biomedical Applications: The Power of Generative AI and Reinforcement Learning
Automatic text summarization (ATS) is an emerging technology to assist clinicians in providing continuous and coordinated care. This study presents an approach to summarize doctor-patient dialogues using generative large language models (LLMs). We developed prompt-tuning algorithms to instruct generative LLMs to summarize clinical text. We examined the prompt-tuning strategies, the size of soft prompts, and the few-short learning ability of GatorTronGPT, a generative clinical LLM developed using 277 billion clinical and general English words with up to 20 billion parameters. We compared GatorTronGPT with a previous solution based on fine-tuning of a widely used T5 model, using a clinical benchmark dataset MTS-DIALOG. The experimental results show that the GatorTronGPT-20B model achieved the best performance on all evaluation metrics. The proposed solution has a low computing cost as the LLM parameters are not updated during prompt-tuning. This study demonstrates the efficiency of generative clinical LLMs for clinical ATS through prompt tuning.
Speaker(s):
Mengxian Lyu, Master
University of Florida
Author(s):
Cheng Peng, PhD - University of Florida; Xiaohan Li, Master - University of Florida; Patrick Balian, DBA - University of Florida; Jiang Bian, PhD - University of Florida; Yonghui Wu, PhD - University of Florida;
Presentation Time: 02:45 PM - 03:00 PM
Abstract Keywords: Machine Learning, Generative AI, and Predictive Modeling, Natural Language Processing, Informatics Research/Biomedical Informatics Research Methods
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Proactive Machine Learning in Biomedical Applications: The Power of Generative AI and Reinforcement Learning
Automatic text summarization (ATS) is an emerging technology to assist clinicians in providing continuous and coordinated care. This study presents an approach to summarize doctor-patient dialogues using generative large language models (LLMs). We developed prompt-tuning algorithms to instruct generative LLMs to summarize clinical text. We examined the prompt-tuning strategies, the size of soft prompts, and the few-short learning ability of GatorTronGPT, a generative clinical LLM developed using 277 billion clinical and general English words with up to 20 billion parameters. We compared GatorTronGPT with a previous solution based on fine-tuning of a widely used T5 model, using a clinical benchmark dataset MTS-DIALOG. The experimental results show that the GatorTronGPT-20B model achieved the best performance on all evaluation metrics. The proposed solution has a low computing cost as the LLM parameters are not updated during prompt-tuning. This study demonstrates the efficiency of generative clinical LLMs for clinical ATS through prompt tuning.
Speaker(s):
Mengxian Lyu, Master
University of Florida
Author(s):
Cheng Peng, PhD - University of Florida; Xiaohan Li, Master - University of Florida; Patrick Balian, DBA - University of Florida; Jiang Bian, PhD - University of Florida; Yonghui Wu, PhD - University of Florida;
Generative AI Is Not Ready for Clinical Use in Patient Education for Lower Back Pain Patients, Even With Retrieval-Augmented Generation
Category
Paper - Student