Times are displayed in (UTC-07:00) Pacific Time (US & Canada) Change
11/11/2024 |
10:30 AM – 12:00 PM |
Continental Ballroom 8-9
S29: Language Models and Beyond - From Words to Wonder
Presentation Type: Oral
Session Chair:
Danielle Mowery
Estimating the effectiveness of a large-scale homelessness program using electronic health record data: a target trial approach
Presentation Time: 10:30 AM - 10:45 AM
Abstract Keywords: Causal Inference, Natural Language Processing, Health Equity
Primary Track: Foundations
Electronic Health Record (EHR) data offers researchers and policymakers a data source for studying complex medical and social phenomena. In this study, we used the target trial emulation framework with data from the VA EHR to evaluate the impact of housing instability on long-term housing instability, healthcare costs, and all-cause mortality. We found that SSVF reduced the risk of housing instability for at least two years. Our findings also provide insights into the benefits of using EHR data for causal inference while emphasizing the need for careful design and domain knowledge while analyzing EHR data.
Speaker(s):
Alec Chapman, MS
University of Utah
Author(s):
Daniel Scharfstein, ScD - University of Utah; Ann Elizabeth Montgomery, PhD - University of Alabama; Thomas Byrne, PhD - Boston University; Ying Suo, MPH - University of Utah; Atim Effiong, MPH - University of Utah; Christa Shorter, MS - University of Utah; Sophia Huebler, MS - University of Utah; Richard Nelson, PhD - University of Utah;
Presentation Time: 10:30 AM - 10:45 AM
Abstract Keywords: Causal Inference, Natural Language Processing, Health Equity
Primary Track: Foundations
Electronic Health Record (EHR) data offers researchers and policymakers a data source for studying complex medical and social phenomena. In this study, we used the target trial emulation framework with data from the VA EHR to evaluate the impact of housing instability on long-term housing instability, healthcare costs, and all-cause mortality. We found that SSVF reduced the risk of housing instability for at least two years. Our findings also provide insights into the benefits of using EHR data for causal inference while emphasizing the need for careful design and domain knowledge while analyzing EHR data.
Speaker(s):
Alec Chapman, MS
University of Utah
Author(s):
Daniel Scharfstein, ScD - University of Utah; Ann Elizabeth Montgomery, PhD - University of Alabama; Thomas Byrne, PhD - Boston University; Ying Suo, MPH - University of Utah; Atim Effiong, MPH - University of Utah; Christa Shorter, MS - University of Utah; Sophia Huebler, MS - University of Utah; Richard Nelson, PhD - University of Utah;
Evaluation of Recommender Systems for Phenotypic Concept Tagging of Clinical Free-Text Descriptions
Presentation Time: 10:45 AM - 11:00 AM
Abstract Keywords: Bioinformatics, Controlled Terminologies, Ontologies, and Vocabularies, Large Language Models (LLMs), Natural Language Processing
Primary Track: Applications
Programmatic Theme: Clinical Research Informatics
Mapping biomedical descriptions to a standard vocabulary system may yield larger, more representative patient populations, resulting in better powered, more generalizable studies. Unfortunately, the manual mapping process, while high fidelity, is laborious and time-consuming. This research seeks to evaluate time benefits of varying recommender systems for biomedical concept tagging versus such a manual review process. The systems comprised OpenAI embeddings, PubMedBERT embeddings, and utilizing the UMLS API. All recommender systems tested were found to provide time savings over manual mapping efforts, with varying levels of precision across the systems tested (best: 79% OpenAI embeddings). These results establish an empirical data context for researchers and project managers who seek to enrich phenotypes with unstructured data in resource-scarce scenarios.
Speaker(s):
Justin Mower, PhD
Regeneron Pharmaceuticals, Inc.
Author(s):
Amelia Averitt, MPH, MA, PhD - Regeneron Pharmaceuticals; Justin Mower, PhD - Regeneron Pharmaceuticals, Inc.; Miriam Nwaru, MS - Regeneron Pharmaceuticals, Inc.; Edward Olszewski, BSN, MHI - Regeneron Pharmaceuticals, Inc.; Deepika Sharma, MHI - Regeneron Pharmaceuticals, Inc.; Nilanjana Banerjee, PhD - Regeneron Pharmaceuticals, Inc.; Michael Cantor, MA, MD - Regeneron Pharmaceuticals, Inc.;
Presentation Time: 10:45 AM - 11:00 AM
Abstract Keywords: Bioinformatics, Controlled Terminologies, Ontologies, and Vocabularies, Large Language Models (LLMs), Natural Language Processing
Primary Track: Applications
Programmatic Theme: Clinical Research Informatics
Mapping biomedical descriptions to a standard vocabulary system may yield larger, more representative patient populations, resulting in better powered, more generalizable studies. Unfortunately, the manual mapping process, while high fidelity, is laborious and time-consuming. This research seeks to evaluate time benefits of varying recommender systems for biomedical concept tagging versus such a manual review process. The systems comprised OpenAI embeddings, PubMedBERT embeddings, and utilizing the UMLS API. All recommender systems tested were found to provide time savings over manual mapping efforts, with varying levels of precision across the systems tested (best: 79% OpenAI embeddings). These results establish an empirical data context for researchers and project managers who seek to enrich phenotypes with unstructured data in resource-scarce scenarios.
Speaker(s):
Justin Mower, PhD
Regeneron Pharmaceuticals, Inc.
Author(s):
Amelia Averitt, MPH, MA, PhD - Regeneron Pharmaceuticals; Justin Mower, PhD - Regeneron Pharmaceuticals, Inc.; Miriam Nwaru, MS - Regeneron Pharmaceuticals, Inc.; Edward Olszewski, BSN, MHI - Regeneron Pharmaceuticals, Inc.; Deepika Sharma, MHI - Regeneron Pharmaceuticals, Inc.; Nilanjana Banerjee, PhD - Regeneron Pharmaceuticals, Inc.; Michael Cantor, MA, MD - Regeneron Pharmaceuticals, Inc.;
Large Language Models Struggle in Token-Level Clinical Named Entity Recognition
Presentation Time: 11:00 AM - 11:15 AM
Abstract Keywords: Large Language Models (LLMs), Information Extraction, Deep Learning, Natural Language Processing
Primary Track: Applications
Large Language Models (LLMs) have revolutionized various sectors, including healthcare where they are employed in diverse applications. Their utility is particularly significant in the context of rare diseases, where data scarcity, complexity, and specificity pose considerable challenges. In the clinical domain, Named Entity Recognition (NER) stands out as an essential task and it plays a crucial role in extracting relevant information from clinical texts. Despite the promise of LLMs, current research mostly concentrates on document-level NER, identifying entities in a more general context across entire documents, without extracting their precise location. Additionally, efforts have been directed towards adapting ChatGPT for token-level NER. However, there is a significant research gap when it comes to employing token-level NER for clinical texts, especially with the use of local open-source LLMs. This study aims to bridge this gap by investigating the effectiveness of both proprietary and local LLMs in token-level clinical NER. Essentially, we delve into the capabilities of these models through a series of experiments involving zero-shot prompting, few-shot prompting, retrieval-augmented generation (RAG), and instruction-fine-tuning. Our exploration reveals the inherent challenges LLMs face in token-level NER, particularly in the context of rare diseases, and suggests possible improvements for their application in healthcare. This research contributes to narrowing a significant gap in healthcare informatics and offers insights that could lead to a more refined application of LLMs in the healthcare sector.
Speaker(s):
Qiuhao Lu, Ph.D.
University of Texas Health Science Center at Houston
Author(s):
Qiuhao Lu, Ph.D. - University of Texas Health Science Center at Houston; Rui Li, Ph.D. - University of Texas Health Science Center at Houston; Andrew Wen, MS - University of Texas Health Sciences Center at Houston; jinlian wang, PhD - UTHealth; Liwei Wang, PhD - UTHealth; Hongfang Liu, PhD - University of Texas Health Science Center at Houston;
Presentation Time: 11:00 AM - 11:15 AM
Abstract Keywords: Large Language Models (LLMs), Information Extraction, Deep Learning, Natural Language Processing
Primary Track: Applications
Large Language Models (LLMs) have revolutionized various sectors, including healthcare where they are employed in diverse applications. Their utility is particularly significant in the context of rare diseases, where data scarcity, complexity, and specificity pose considerable challenges. In the clinical domain, Named Entity Recognition (NER) stands out as an essential task and it plays a crucial role in extracting relevant information from clinical texts. Despite the promise of LLMs, current research mostly concentrates on document-level NER, identifying entities in a more general context across entire documents, without extracting their precise location. Additionally, efforts have been directed towards adapting ChatGPT for token-level NER. However, there is a significant research gap when it comes to employing token-level NER for clinical texts, especially with the use of local open-source LLMs. This study aims to bridge this gap by investigating the effectiveness of both proprietary and local LLMs in token-level clinical NER. Essentially, we delve into the capabilities of these models through a series of experiments involving zero-shot prompting, few-shot prompting, retrieval-augmented generation (RAG), and instruction-fine-tuning. Our exploration reveals the inherent challenges LLMs face in token-level NER, particularly in the context of rare diseases, and suggests possible improvements for their application in healthcare. This research contributes to narrowing a significant gap in healthcare informatics and offers insights that could lead to a more refined application of LLMs in the healthcare sector.
Speaker(s):
Qiuhao Lu, Ph.D.
University of Texas Health Science Center at Houston
Author(s):
Qiuhao Lu, Ph.D. - University of Texas Health Science Center at Houston; Rui Li, Ph.D. - University of Texas Health Science Center at Houston; Andrew Wen, MS - University of Texas Health Sciences Center at Houston; jinlian wang, PhD - UTHealth; Liwei Wang, PhD - UTHealth; Hongfang Liu, PhD - University of Texas Health Science Center at Houston;
Pre-Trained Large Language Models’ Utility for Food Concept Normalization
Presentation Time: 11:15 AM - 11:30 AM
Abstract Keywords: Patient / Person Generated Health Data (Patient Reported Outcomes), Large Language Models (LLMs), Precision Medicine, Information Extraction, Natural Language Processing, Controlled Terminologies, Ontologies, and Vocabularies
Primary Track: Applications
Programmatic Theme: Clinical Research Informatics
Patient-generated health data facilitates more informed precision nutrition interventions, but patient-generated health data is heterogeneous. This limitation can be mitigated by applying concept mapping techniques to standardize patient-generated health data. However, current approaches struggle with processing abbreviations and food brand detection. In this study, we argue that pre-trained large language models can improve concept mapping when applied to patient-generated free text meal records, which addresses the above challenge presented by patient-generated health data.
Speaker(s):
Adit Anand, B.S.
Columbia University
Author(s):
Yanwei Li, BS - Columbia University; Lena Mamykina, PhD - Columbia University; Chunhua Weng, PhD - Columbia University;
Presentation Time: 11:15 AM - 11:30 AM
Abstract Keywords: Patient / Person Generated Health Data (Patient Reported Outcomes), Large Language Models (LLMs), Precision Medicine, Information Extraction, Natural Language Processing, Controlled Terminologies, Ontologies, and Vocabularies
Primary Track: Applications
Programmatic Theme: Clinical Research Informatics
Patient-generated health data facilitates more informed precision nutrition interventions, but patient-generated health data is heterogeneous. This limitation can be mitigated by applying concept mapping techniques to standardize patient-generated health data. However, current approaches struggle with processing abbreviations and food brand detection. In this study, we argue that pre-trained large language models can improve concept mapping when applied to patient-generated free text meal records, which addresses the above challenge presented by patient-generated health data.
Speaker(s):
Adit Anand, B.S.
Columbia University
Author(s):
Yanwei Li, BS - Columbia University; Lena Mamykina, PhD - Columbia University; Chunhua Weng, PhD - Columbia University;
Evaluating the Performance of Instruction Tuned Large Language Models on Biomedical Entity Recognition
Presentation Time: 11:30 AM - 11:45 AM
Abstract Keywords: Large Language Models (LLMs), Natural Language Processing, Information Extraction
Primary Track: Applications
This study proposes a paradigm based on instruction-tuning Large Language Models (LLMs) that transforms biomedical NER from sequence labeling task into a generation task. The paradigm repurposes existing NER datasets to develop BioNER-LLaMA using LLaMA2-7B. For the first time we show a general domain LLM achieving performance comparable to fine-tuned PubMedBERT models and better performance than biomedical-specific LLM (PMC-LLaMA). The findings underscore the paradigm's potential for developing LLMs that rival SOTA performance in biomedical applications.
Speaker(s):
Vipina K. Keloth, PhD
Yale University
Author(s):
Presentation Time: 11:30 AM - 11:45 AM
Abstract Keywords: Large Language Models (LLMs), Natural Language Processing, Information Extraction
Primary Track: Applications
This study proposes a paradigm based on instruction-tuning Large Language Models (LLMs) that transforms biomedical NER from sequence labeling task into a generation task. The paradigm repurposes existing NER datasets to develop BioNER-LLaMA using LLaMA2-7B. For the first time we show a general domain LLM achieving performance comparable to fine-tuned PubMedBERT models and better performance than biomedical-specific LLM (PMC-LLaMA). The findings underscore the paradigm's potential for developing LLMs that rival SOTA performance in biomedical applications.
Speaker(s):
Vipina K. Keloth, PhD
Yale University
Author(s):
CDEMapper: Enhancing NIH Common Data Element Normalization using Large Language Models
Presentation Time: 11:45 AM - 12:00 PM
Abstract Keywords: Large Language Models (LLMs), Interoperability and Health Information Exchange, Controlled Terminologies, Ontologies, and Vocabularies
Primary Track: Applications
Programmatic Theme: Clinical Research Informatics
In biomedical research, standardizing the gathering and dissemination of Common Data Elements (CDEs) plays a pivotal role in improving data interoperability and enabling the reuse of scientific data. However, widespread adoption of CDEs has been hindered by challenges such as a lack of awareness, a preference for creating new CDEs rather than harmonizing existing ones, and the complexity involved in selecting appropriate CDEs. To address these challenges, we developed a publicly available, user-friendly tool named CDEMapper, which leverages Large Language Models (LLMs) to improve the efficiency of mapping study variables to NIH CDEs. CDEMapper integrates 23,041 CDEs through indexing and semantic embedding techniques, simplifying the mapping process with advanced search and re-ranking services. Our evaluation results demonstrate significant improvements in mapping accuracy with the incorporation of GPT-4.0, especially in handling multiple-to-one mapping challenges, compared to traditional string-matching algorithms like BM25. This indicates that utilizing LLMs can effectively enhance the accuracy and efficiency of CDE mapping, providing strong support for data standardization and sharing in biomedical research.
Speaker(s):
Jimin Huang, MS
Yale University
Author(s):
Jimin Huang, MS - Yale University; Yan Wang, PhD - Yale University; Huan He, Ph.D. - Yale University; Fongci Lin, PhD - Yale University; Yan Hu - UTHealth Science Center Houston; Qianqian Xie, PhD - Yale University; Pritham Ram, MS - Yale University; Xiaoqian Jiang, PhD - University of Texas Health Science Center at Houston; Hua Xu, Ph.D - Yale University; Na Hong, PhD - Yale University;
Presentation Time: 11:45 AM - 12:00 PM
Abstract Keywords: Large Language Models (LLMs), Interoperability and Health Information Exchange, Controlled Terminologies, Ontologies, and Vocabularies
Primary Track: Applications
Programmatic Theme: Clinical Research Informatics
In biomedical research, standardizing the gathering and dissemination of Common Data Elements (CDEs) plays a pivotal role in improving data interoperability and enabling the reuse of scientific data. However, widespread adoption of CDEs has been hindered by challenges such as a lack of awareness, a preference for creating new CDEs rather than harmonizing existing ones, and the complexity involved in selecting appropriate CDEs. To address these challenges, we developed a publicly available, user-friendly tool named CDEMapper, which leverages Large Language Models (LLMs) to improve the efficiency of mapping study variables to NIH CDEs. CDEMapper integrates 23,041 CDEs through indexing and semantic embedding techniques, simplifying the mapping process with advanced search and re-ranking services. Our evaluation results demonstrate significant improvements in mapping accuracy with the incorporation of GPT-4.0, especially in handling multiple-to-one mapping challenges, compared to traditional string-matching algorithms like BM25. This indicates that utilizing LLMs can effectively enhance the accuracy and efficiency of CDE mapping, providing strong support for data standardization and sharing in biomedical research.
Speaker(s):
Jimin Huang, MS
Yale University
Author(s):
Jimin Huang, MS - Yale University; Yan Wang, PhD - Yale University; Huan He, Ph.D. - Yale University; Fongci Lin, PhD - Yale University; Yan Hu - UTHealth Science Center Houston; Qianqian Xie, PhD - Yale University; Pritham Ram, MS - Yale University; Xiaoqian Jiang, PhD - University of Texas Health Science Center at Houston; Hua Xu, Ph.D - Yale University; Na Hong, PhD - Yale University;
S29: Language Models and Beyond - From Words to Wonder
Description
Date: Monday (11/11)
Time: 10:30 AM to 12:00 PM
Room: Continental Ballroom 8-9
Time: 10:30 AM to 12:00 PM
Room: Continental Ballroom 8-9