Times are displayed in (UTC-08:00) Pacific Time (US & Canada) Change
11/12/2024 |
10:30 AM – 12:00 PM |
Franciscan A
S74: NLP in Clinical Notes - s/p supp qid
Presentation Type: Oral
Session Chair:
Feifan Liu, PhD - University of Massachusetts Chan Medical School
Evaluating the Performance of Large Language Models for Named Entity Recognition in Ophthalmology Clinical Free-Text Notes
Presentation Time: 10:30 AM - 10:45 AM
Abstract Keywords: Natural Language Processing, Information Extraction, Large Language Models (LLMs)
Primary Track: Applications
Programmatic Theme: Clinical Research Informatics
This study compared large language models (LLMs) and Bidirectional Encoder Representations from Transformers (BERT) models in identifying medication names, routes, and frequencies from publicly available free-text ophthalmology progress notes of 480 patients. 5,520 lines of annotated text were divided into train (N=3,864), validation (N=1,104), and test sets (N=552). We evaluated ChatGPT-3.5, ChatGPT-4.0, PaLM 2, and Gemini to identify these medication entities. We fine-tuned BERT, BioBERT, ClinicalBERT, DistilBERT, and RoBERTa for the same task using the training set. On the test set, GPT-4 achieved the best performance (micro-averaged F1 0.966). Among the BERT models, BioBERT achieved the best performance (micro-averaged F1 0.874). Modern LLMs outperformed BERT models even in the highly domain-specific task of identifying ophthalmic medication information from progress notes, showcasing the potential of LLMs for medical named entity recognition to enhance patient care.
Speaker(s):
Iyad Majid, High School Degree
Stanford Ophthalmic Informatics and Artificial Intelligence Group
Author(s):
Iyad Majid, HSD - Stanford Ophthalmic Informatics and Artificial Intelligence Group; Vaibhav Mishra, HSD - Stanford Ophthalmic Informatics and Artificial Intelligence Group; Rohith Ravindranath, BS, MS - Stanford Ophthalmic Informatics and Artificial Intelligence Group; Sophia Wang, MD, MS - Stanford University;
Presentation Time: 10:30 AM - 10:45 AM
Abstract Keywords: Natural Language Processing, Information Extraction, Large Language Models (LLMs)
Primary Track: Applications
Programmatic Theme: Clinical Research Informatics
This study compared large language models (LLMs) and Bidirectional Encoder Representations from Transformers (BERT) models in identifying medication names, routes, and frequencies from publicly available free-text ophthalmology progress notes of 480 patients. 5,520 lines of annotated text were divided into train (N=3,864), validation (N=1,104), and test sets (N=552). We evaluated ChatGPT-3.5, ChatGPT-4.0, PaLM 2, and Gemini to identify these medication entities. We fine-tuned BERT, BioBERT, ClinicalBERT, DistilBERT, and RoBERTa for the same task using the training set. On the test set, GPT-4 achieved the best performance (micro-averaged F1 0.966). Among the BERT models, BioBERT achieved the best performance (micro-averaged F1 0.874). Modern LLMs outperformed BERT models even in the highly domain-specific task of identifying ophthalmic medication information from progress notes, showcasing the potential of LLMs for medical named entity recognition to enhance patient care.
Speaker(s):
Iyad Majid, High School Degree
Stanford Ophthalmic Informatics and Artificial Intelligence Group
Author(s):
Iyad Majid, HSD - Stanford Ophthalmic Informatics and Artificial Intelligence Group; Vaibhav Mishra, HSD - Stanford Ophthalmic Informatics and Artificial Intelligence Group; Rohith Ravindranath, BS, MS - Stanford Ophthalmic Informatics and Artificial Intelligence Group; Sophia Wang, MD, MS - Stanford University;
Narrative Feature or Structured Feature? A Study of Large Language Models to Identify Cancer Patients at Risk of Heart Failure
Presentation Time: 10:45 AM - 11:00 AM
Abstract Keywords: Large Language Models (LLMs), Deep Learning, Machine Learning, Patient Safety
Primary Track: Applications
Programmatic Theme: Clinical Research Informatics
Cancer treatments are known to introduce cardiotoxicity, negatively impacting outcomes and survivorship. Identifying cancer patients at risk of heart failure (HF) is critical to improving cancer treatment outcomes and safety. This study examined machine learning (ML) models to identify cancer patients at risk of HF using electronic health records (EHRs), including traditional ML, Time-Aware long short-term memory (T-LSTM), and large language models (LLMs) using novel narrative features derived from the structured medical codes. We identified a cancer cohort of 12,806 patients from the University of Florida Health, diagnosed with lung, breast, and colorectal cancers, among which 1,602 individuals developed HF after cancer. The LLM, GatorTron-3.9B, achieved the best F1 scores, outperforming the traditional support vector machines by 39%, the T-LSTM deep learning model by 7%, and a widely used transformer model, BERT, by 5.6%. The analysis shows that the proposed narrative features remarkably increased feature density and improved performance.
Speaker(s):
Ziyi Chen, Master of Science
Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida
Author(s):
Ziyi Chen, Master of Science - Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida; Mengyuan Zhang, Bachelor of Science - Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida; Mustafa Mohammed Ahmed, MD - Division of Cardiovascular Medicine, Department of Medicine, College of Medicine, University of Florida; Yi Guo, PhD - University of Florida; Thomas George, MD - Division of Hematology & Oncology, Department of Medicine, College of Medicine, University of Florida; Jiang Bian, PhD - University of Florida; Yonghui Wu, PhD - University of Florida;
Presentation Time: 10:45 AM - 11:00 AM
Abstract Keywords: Large Language Models (LLMs), Deep Learning, Machine Learning, Patient Safety
Primary Track: Applications
Programmatic Theme: Clinical Research Informatics
Cancer treatments are known to introduce cardiotoxicity, negatively impacting outcomes and survivorship. Identifying cancer patients at risk of heart failure (HF) is critical to improving cancer treatment outcomes and safety. This study examined machine learning (ML) models to identify cancer patients at risk of HF using electronic health records (EHRs), including traditional ML, Time-Aware long short-term memory (T-LSTM), and large language models (LLMs) using novel narrative features derived from the structured medical codes. We identified a cancer cohort of 12,806 patients from the University of Florida Health, diagnosed with lung, breast, and colorectal cancers, among which 1,602 individuals developed HF after cancer. The LLM, GatorTron-3.9B, achieved the best F1 scores, outperforming the traditional support vector machines by 39%, the T-LSTM deep learning model by 7%, and a widely used transformer model, BERT, by 5.6%. The analysis shows that the proposed narrative features remarkably increased feature density and improved performance.
Speaker(s):
Ziyi Chen, Master of Science
Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida
Author(s):
Ziyi Chen, Master of Science - Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida; Mengyuan Zhang, Bachelor of Science - Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida; Mustafa Mohammed Ahmed, MD - Division of Cardiovascular Medicine, Department of Medicine, College of Medicine, University of Florida; Yi Guo, PhD - University of Florida; Thomas George, MD - Division of Hematology & Oncology, Department of Medicine, College of Medicine, University of Florida; Jiang Bian, PhD - University of Florida; Yonghui Wu, PhD - University of Florida;
Evaluating Large Language Models for Drafting Emergency Department Discharge Summaries
Presentation Time: 11:00 AM - 11:15 AM
Abstract Keywords: Large Language Models (LLMs), Evaluation, Natural Language Processing
Primary Track: Applications
Programmatic Theme: Clinical Informatics
We evaluated the performance of two large language models, GPT-3.5-turbo and GPT-4, in generating Emergency Department (ED) discharge summaries. Using 100 randomly selected ED encounters, we found that GPT-4 outperforms GPT-3.5-turbo, generating discharge summaries that were highly accurate but liable to hallucinations and clinical omissions. While our results are promising, further work is needed to better understand how to prevent LLM hallucinations and ensure all clinically relevant information is included before clinical deployment.
Speaker(s):
Christopher Williams, MB BChir
UCSF
Author(s):
Jaskaran Bains, MD - UCSF; Tianyu Tang, MD - UCSF; Kishan Patel, MD - UCSF; Alexa Lucas, MD - UCSF; Fiona Chen, MD - UCSF; Brenda Miao, BA - UCSF; Atul Butte, MD, PhD - University of California, San Francisco; Aaron Kornblith, MD - UCSF;
Presentation Time: 11:00 AM - 11:15 AM
Abstract Keywords: Large Language Models (LLMs), Evaluation, Natural Language Processing
Primary Track: Applications
Programmatic Theme: Clinical Informatics
We evaluated the performance of two large language models, GPT-3.5-turbo and GPT-4, in generating Emergency Department (ED) discharge summaries. Using 100 randomly selected ED encounters, we found that GPT-4 outperforms GPT-3.5-turbo, generating discharge summaries that were highly accurate but liable to hallucinations and clinical omissions. While our results are promising, further work is needed to better understand how to prevent LLM hallucinations and ensure all clinically relevant information is included before clinical deployment.
Speaker(s):
Christopher Williams, MB BChir
UCSF
Author(s):
Jaskaran Bains, MD - UCSF; Tianyu Tang, MD - UCSF; Kishan Patel, MD - UCSF; Alexa Lucas, MD - UCSF; Fiona Chen, MD - UCSF; Brenda Miao, BA - UCSF; Atul Butte, MD, PhD - University of California, San Francisco; Aaron Kornblith, MD - UCSF;
Natural Language Processing on Unstructured Data in Hypertension Research and Clinical Practice
Presentation Time: 11:15 AM - 11:30 AM
Abstract Keywords: Chronic Care Management, Data Mining, Natural Language Processing
Primary Track: Applications
Hypertension, commonly referred to as high blood pressure, remains a significant global health concern, with its prevalence steadily increasing over the years and impacting 1.3 billion people worldwide. The management and understanding of this condition have been greatly influenced by the rapid advancement of technology and the growing availability of vast amounts of health-related data. In particular, the advent of Natural Language Processing (NLP) has opened new avenues for researchers and healthcare practitioners to extract valuable insights from the vast sea of unstructured data in hypertension.
Unstructured clinical data, often stored in the form of electronic health records (EHRs), clinical notes, and medical literature, hold invaluable clinical information that, when harnessed effectively, can enhance our understanding of hypertension etiology, diagnosis, treatment, and patient outcomes. However, efficiently and accurately extracting information from these unstructured narratives within medical records remains a significant challenge, in part due to the heterogeneous and complex nature of medical languages embedded in clinical texts. The recent advances in NLP enable automated analysis of unstructured textual data and facilitate the extraction of valuable clinical insights, trend analysis, predictive modeling, and decision support in hypertension care. Despite the potential benefits, there is a gap in understanding how NLP can be applied to unstructured clinical data in hypertension, including the diverse applications, challenges, and opportunities it presents.
This scoping review aims to provide a comprehensive overview of the current landscape of NLP development and applications in hypertension research and clinical practice.
Speaker(s):
Jiancheng Ye, PhD
Weill Cornell Medicine
Author(s):
Presentation Time: 11:15 AM - 11:30 AM
Abstract Keywords: Chronic Care Management, Data Mining, Natural Language Processing
Primary Track: Applications
Hypertension, commonly referred to as high blood pressure, remains a significant global health concern, with its prevalence steadily increasing over the years and impacting 1.3 billion people worldwide. The management and understanding of this condition have been greatly influenced by the rapid advancement of technology and the growing availability of vast amounts of health-related data. In particular, the advent of Natural Language Processing (NLP) has opened new avenues for researchers and healthcare practitioners to extract valuable insights from the vast sea of unstructured data in hypertension.
Unstructured clinical data, often stored in the form of electronic health records (EHRs), clinical notes, and medical literature, hold invaluable clinical information that, when harnessed effectively, can enhance our understanding of hypertension etiology, diagnosis, treatment, and patient outcomes. However, efficiently and accurately extracting information from these unstructured narratives within medical records remains a significant challenge, in part due to the heterogeneous and complex nature of medical languages embedded in clinical texts. The recent advances in NLP enable automated analysis of unstructured textual data and facilitate the extraction of valuable clinical insights, trend analysis, predictive modeling, and decision support in hypertension care. Despite the potential benefits, there is a gap in understanding how NLP can be applied to unstructured clinical data in hypertension, including the diverse applications, challenges, and opportunities it presents.
This scoping review aims to provide a comprehensive overview of the current landscape of NLP development and applications in hypertension research and clinical practice.
Speaker(s):
Jiancheng Ye, PhD
Weill Cornell Medicine
Author(s):
Multi-label Classification of Suicidal Tendencies Using Large Language Models from Psychiatric Evaluation Notes
Presentation Time: 11:30 AM - 11:45 AM
Abstract Keywords: Large Language Models (LLMs), Natural Language Processing, Patient Safety, Data Mining
Primary Track: Applications
Programmatic Theme: Clinical Research Informatics
Suicide is a growing public health concern, particularly prevalent among individuals with mental illness. This study explores one of the first single multi-label classifier for detecting various suicidal tendencies in psychiatric evaluation notes using generative AI techniques (GPT-4), comparing with multiple binary classifiers for multi-label classification implemented with other large language models (e.g., BERT). We constructed a novel clinical corpus of 500 annotated initial psychiatric evaluation notes to identify five types of suicidal tendencies, including suicidal ideation, suicide attempts, non-suicidal self-injury, exposure to other’s suicide, and none-suicidal. Our findings indicate the superiority of few-shot prompted LLMs, notably GPT-4 (macro average F1=0.80, exact match ratio=0.66), over fine-tuned BERT-based models (macro average F1=0.77, exact match ratio=0.58), advocating for incorporating in-depth, domain-specific examples to improve the identification of complex clinical expressions. This study underscores the critical role of integrating domain-specific examples into model learning, enhancing their contextual awareness and understanding. It demonstrates the feasibility and reliability of multi-label classification with generative AI models in discerning various complex suicidal tendencies from clinical narratives.
Speaker(s):
Zehan (Leo) Li, PhD
The Univeristy of Texas Health Science Center at Houston (UTHealth) School of Biomedical Informatics
Author(s):
Presentation Time: 11:30 AM - 11:45 AM
Abstract Keywords: Large Language Models (LLMs), Natural Language Processing, Patient Safety, Data Mining
Primary Track: Applications
Programmatic Theme: Clinical Research Informatics
Suicide is a growing public health concern, particularly prevalent among individuals with mental illness. This study explores one of the first single multi-label classifier for detecting various suicidal tendencies in psychiatric evaluation notes using generative AI techniques (GPT-4), comparing with multiple binary classifiers for multi-label classification implemented with other large language models (e.g., BERT). We constructed a novel clinical corpus of 500 annotated initial psychiatric evaluation notes to identify five types of suicidal tendencies, including suicidal ideation, suicide attempts, non-suicidal self-injury, exposure to other’s suicide, and none-suicidal. Our findings indicate the superiority of few-shot prompted LLMs, notably GPT-4 (macro average F1=0.80, exact match ratio=0.66), over fine-tuned BERT-based models (macro average F1=0.77, exact match ratio=0.58), advocating for incorporating in-depth, domain-specific examples to improve the identification of complex clinical expressions. This study underscores the critical role of integrating domain-specific examples into model learning, enhancing their contextual awareness and understanding. It demonstrates the feasibility and reliability of multi-label classification with generative AI models in discerning various complex suicidal tendencies from clinical narratives.
Speaker(s):
Zehan (Leo) Li, PhD
The Univeristy of Texas Health Science Center at Houston (UTHealth) School of Biomedical Informatics
Author(s):
Extracting Social Determinants of Health Information from Clinical Notes Using Large Language Models
Presentation Time: 11:45 AM - 12:00 PM
Abstract Keywords: Diversity, Equity, Inclusion, Accessibility, and Health Equity, Large Language Models (LLMs), Information Extraction, Natural Language Processing, Data Mining, Health Equity
Primary Track: Applications
Our work investigates the potential of large language models (LLMs) in extracting and categorizing social determinants of health (SDOH) from clinical notes. MIMIC dataset analysis revealed low SDOH ICD code utilization, yet significant SDOH details in clinical notes through NLP application. Given the potential for extracting SDOH from free-text, challenges unraveled using Llama-2 compared to previous benchmarks. This research emphasizes needed methodological refinement and annotated datasets to advance SDOH integration in healthcare practice and research.
Speaker(s):
Tim Schwirtlich, PhD
Northwestern University
Author(s):
Tim Schwirtlich, PhD - Northwestern University; Vivian Pan, MS, CGC - University of Illinois; Sara Muhammad, MS - University of Illinois Chicago; Saki Amagai - Northwestern University; Yuan Luo, PhD - Northwestern University;
Presentation Time: 11:45 AM - 12:00 PM
Abstract Keywords: Diversity, Equity, Inclusion, Accessibility, and Health Equity, Large Language Models (LLMs), Information Extraction, Natural Language Processing, Data Mining, Health Equity
Primary Track: Applications
Our work investigates the potential of large language models (LLMs) in extracting and categorizing social determinants of health (SDOH) from clinical notes. MIMIC dataset analysis revealed low SDOH ICD code utilization, yet significant SDOH details in clinical notes through NLP application. Given the potential for extracting SDOH from free-text, challenges unraveled using Llama-2 compared to previous benchmarks. This research emphasizes needed methodological refinement and annotated datasets to advance SDOH integration in healthcare practice and research.
Speaker(s):
Tim Schwirtlich, PhD
Northwestern University
Author(s):
Tim Schwirtlich, PhD - Northwestern University; Vivian Pan, MS, CGC - University of Illinois; Sara Muhammad, MS - University of Illinois Chicago; Saki Amagai - Northwestern University; Yuan Luo, PhD - Northwestern University;