American Medical Informatics Association

Home
2025 Annual Symposium Program Gallery
S49: Extractors Assemble: LLMs and the New Avengers of Biomedical Information Extraction

Click here to set to your browser time zone

Times are displayed in (UTC-04:00) Eastern Time (US & Canada) Change

Add to Google | Outlook | iCalendar

Custom CSS

double-click to edit, do not edit in source

11/17/2025 | 3:30 PM – 4:45 PM | Room 7

S49: Extractors Assemble: LLMs and the New Avengers of Biomedical Information Extraction

Presentation Type: Oral Presentations

An Information Extraction Approach to Detecting Novelty of Biomedical Publications

Presentation Time: 03:30 PM - 03:42 PM

Abstract Keywords: Information Extraction, Natural Language Processing, Data Mining
Primary Track: Foundations

Scientific novelty plays a critical role in shaping research impact, yet it remains inconsistently defined and difficult to quantify. Existing approaches often reduce novelty to a single measure, failing to distinguish the specific types of contributions that drive influence. In this study, we introduce a semantic measure of novelty based on the emergence of new biomedical entities and relationships within the conclusion sections of research articles. Leveraging transformer-based named entity recognition and relation extraction tools, we identify novel findings and classify articles into four categories: No Novelty, Entity-only Novelty, Relation-only Novelty, and Entity-Relation Novelty. We evaluate this framework using citation counts and Journal Impact Factors as proxies for research influence. Our results show that Entity-Relation Novelty articles receive the highest citation impact, with relation novelty more closely aligned with high-impact journals. These findings offer a scalable framework for assessing novelty and guiding future research evaluation.

Speaker:
Xueqing Peng, PhD
Yale University

Authors:
Xueqing Peng, PhD - Yale University; Brian Ondov, PhD - Yale School of Medicine; Huan He, Ph.D. - Yale University; Yan Hu, MS - UTHealth Science Center Houston; Hua Xu, Ph.D - Yale University;

Relation Extraction with Instance-Adapted Predicate Descriptions

Presentation Time: 03:42 PM - 03:54 PM

Abstract Keywords: Natural Language Processing, Information Extraction, Deep Learning
Working Group: Natural Language Processing Working Group
Primary Track: Foundations
Programmatic Theme: Clinical Research Informatics

Relation extraction (RE) is a standard information extraction task playing a major role in downstream applications such as knowledge discovery and question answering. Although decoder-only large language models are excelling in generative tasks, smaller encoder models are still the go to architecture for RE. In this paper, we revisit fine-tuning such smaller models using a novel dual-encoder architecture with a joint contrastive and cross-entropy loss. Unlike previous methods that employ a fixed linear layer for predicate representations, our approach uses a second encoder to compute instance-specific predicate representations by infusing them with real entity spans from corresponding input instances. We conducted experiments on two biomedical RE datasets and two general domain datasets. Our approach achieved F1 score improvements ranging from 1% to 2% over state-of-the-art methods with a simple but elegant formulation. Ablation studies justify the importance of various components built into the proposed architecture.

Speaker:
Ramakanth Kavuluru, PhD
University of Kentucky, College of Medicine

Authors:
Yuhang Jiang, MS - University of Kentucky; Ramakanth Kavuluru, PhD - University of Kentucky, College of Medicine;

Leveraging Large Language Models for Thyroid Nodule Information Extraction and Matching Across Medical Reports

Presentation Time: 03:54 PM - 04:06 PM

Abstract Keywords: Large Language Models (LLMs), Deep Learning, Clinical Decision Support
Primary Track: Applications
Programmatic Theme: Clinical Research Informatics

Accurate extraction of thyroid nodule features from radiology and pathology reports is clinically essential for guiding patient management decisions, such as surgical intervention or active surveillance. However, manual data extraction from electronic health records is labor-intensive and prone to inter-rater variability. To address this challenge, we evaluated open-source large language models (LLMs) for automating the extraction and matching of these critical nodule features. Using a retrospective dataset of 451 ultrasound and pathology report pairs, we developed an annotation schema capturing nodule characteristics. Two LLMs—Llama-3.3 70B and QwQ-32B—were benchmarked against manual annotations. Both models demonstrated near-perfect extraction accuracy for clinically relevant features such as location, size, and biopsy results. Notably, QwQ-32B achieved an F1 score of 0.987 on the complex multi-step reasoning task of matching nodules across reports. Our findings suggest that integrating LLMs into clinical annotation workflows can significantly reduce clinician workload and inter-rater variability while maintaining high accuracy.

Speaker:
Dongwoo Lee, B.S.
UCLA Medical Informatics Home Area

Authors:
Dongwoo Lee, B.S. - UCLA Medical Informatics Home Area; Dominic Amara, M.D. MS - UCLA Health; Chandler Beon, BA - University of California, Los Angeles; Steven Swee, PhD Student - University of California, Los Angeles; Ashwath Radhachandra, B.S. - UCLA; Shreeram Athreya, B.S. - UCLA; Vedrana Ivezic, Graduate Student; Corey Arnold, PhD - UCLA; William Speier, PhD - UCLA;

Addressing Generalizability in Clinical Named Entity Recognition: Federated Learning or Large Language Models? A Case Study on Visual Acuity Extraction from US and UK Eye Institutes

Presentation Time: 04:06 PM - 04:18 PM

Abstract Keywords: Information Extraction, Artificial Intelligence, Data Mining, Privacy and Security
Primary Track: Applications
Programmatic Theme: Clinical Informatics

Clinical Named Entity Recognition (NER) is vital for extracting structured data from clinical text, but ensuring model generalizability across institutions remains challenging. This study compares two approaches: (1) Federated Learning (FL), a privacy-preserving decentralized method, and (2) Large Language Models (LLMs) trained on diverse corpora. We evaluate Visual Acuity (VA) extraction from ophthalmology notes at Stanford (USA) and Moorfields Eye Hospital (UK), using BERT-based models, FL strategies (FedAvg, STWT), and LLMs (LLaMA-3-70B, Mixtral-8x7B). Results show that FL significantly improves generalization, with STWT outperforming FedAvg in stability and accuracy. LLMs demonstrate strong performance on MEH data but struggle with structured Stanford notes. These findings highlight FL’s effectiveness for cross-institutional learning while revealing domain-specific limitations of LLMs, underscoring the need for tailored approaches to clinical NER.

Speaker:
Quang Nguyen, MRes
UCL Institute of Health Informatics

Authors:
Honghan Wu, PhD - University College London; Nikolas Pontikos, PhD - UCL Institute of Ophthalmology; Sophia Wang, MD, MS - Stanford University;

Bridging “the Last Mile”: User-Centered Design for Efficient AI-Assisted Unstructured Clinical Data Abstraction

Presentation Time: 04:18 PM - 04:30 PM

Abstract Keywords: User-centered Design Methods, Artificial Intelligence, Documentation Burden, Human-computer Interaction, Large Language Models (LLMs), Qualitative Methods, Surveys and Needs Analysis
Primary Track: Applications
Programmatic Theme: Clinical Research Informatics

This work explores a user-centric approach to AI-assisted data abstraction for clinical research coordinators. Using surveys, interviews, and a Design Thinking workshop to inform a proof-of-concept, a model was developed to extract performance scores from unstructured sources using an LLM. This ongoing research aims to validate feasibility, viability, and desirability while establishing a process to mitigate unexpected deployment risks of an AI solution, known as “the last mile”, in real-world healthcare settings.

Speaker:
Leemor Yuravlivker, BComm
Memorial Sloan Kettering Cancer Center

Authors:
Rohan Singh, Mac - Memorial Sloan Kettering; Bo Young Kim, Bachelor of Human-Computer Interaction - Memorial Sloan Kettering Cancer Center; Nancy Bouvier, BS - Memorial Sloan Kettering Cancer Center; Matt Stapylton, BA - Memorial Sloan Kettering Cancer Center; Nadia Bahadur, Masters of Clinical Research - Memorial Sloan Kettering Cancer Center; Andrew Niederhausern, BS - MSKCC; John Philip, MS - Memorial Sloan Kettering Cancer Center; Joseph Lengfellner - Memorial Sloan Kettering Cancer Center;

Improving Large Language Model Applications in Biomedicine with Retrieval-Augmented Generation: A Systematic Review, Meta-Analysis, and Clinical Development Guidelines

Presentation Time: 04:30 PM - 04:42 PM

Abstract Keywords: Large Language Models (LLMs), Artificial Intelligence, Natural Language Processing
Primary Track: Applications

This study synthesizes recent research on retrieval-augmented generation (RAG) and large language models (LLMs) in biomedicine. A systematic review and meta-analysis of 20 studies demonstrated that RAG improves performance over baseline LLMs (odds ratio 1.35). Based on these findings, we propose clinical guidelines (GUIDE-RAG) to improve the integration of RAG, emphasizing system, knowledge, and electronic health record enhancements.

Speaker:
Siru Liu, PhD
Vanderbilt University Medical Center

Authors:
Allison McCoy, PhD, ACHIP, FACMI, FAMIA - Vanderbilt University Medical Center; Adam Wright, PhD - Vanderbilt University Medical Center;

S49: Extractors Assemble: LLMs and the New Avengers of Biomedical Information Extraction

Description

Custom CSS

double-click to edit, do not edit in source

Date: Monday (11/17)
Time: 3:30 PM to 4:45 PM
Room: Room 7

Back to Program Schedule

Searching for individual speakers is available here:

Speaker Schedule