American Medical Informatics Association - From Compound Figures to Composite Understanding: Developing a Multi-Modal LLM from Biomedical Literature with Medical Multiple-Image Benchmarking and Validation

Does Recording Hardware Matter for Clinical Speech Recognition? Evaluating ASR Performance Across Consumer Devices

Presentation Type: Paper - Student
Presentation Time: 03:30 PM - 03:42 PM

Abstract Keywords: Documentation Burden, Artificial Intelligence, Large Language Models (LLMs)
Programmatic Theme: Clinical Informatics

Ambient clinical intelligence (ACI) systems use automatic speech recognition (ASR) to capture patient-provider conversations for downstream clinical documentation. However, many ASR evaluations are conducted under controlled conditions using specialized hardware. We evaluated how recording devices influence transcription performance of contemporary ASR engines applied to clinical dialogue. Thirty-five primary care encounters were re-enacted from transcribed conversations and recorded using five devices simultaneously: smartphone, laptop microphone, portable recorder, clip-on microphone, and a desktop microphone. Six ASR engines were evaluated using word error rate (WER), clinical concept extraction precision and recall, and sentence-level semantic similarity. Median WER ranged from 16.7% to 20.7% across engines. Engine choice produced larger variation in transcription performance than recording device, although device-related differences were statistically significant. Overall, contemporary ASR engines demonstrated relative robustness to consumer-grade recording hardware, suggesting that model selection may have greater impact on transcription performance than recording device configuration in real-world ACI deployments.

Speaker(s):
Brian Tran, BS
University of California, Irvine

Author(s):
Brian Tran, BS - University of California, Irvine; Di Hu, Master of Science in Information Systems - University of California - Irvine; Seungjun Kim; Yawen Guo, MISM - University of California - Irvine; Ramya Mangu, BS - UC Irvine; Tera Reynolds, PhD, MPH, MS - University of Maryland Baltimore County; Jennifer Lafata, PhD - UNC Health; Ming Tai-Seale, PhD, MPH - UCSD; Kai Zheng, PhD - University of California, Irvine;

Who Said What? Acoustic and LLM Hybrid Speaker Diarization for Patient Speech Identification in Community-Based Managed Long-term Care Phone Call Communications

Presentation Type: Podium Abstract
Presentation Time: 03:42 PM - 03:54 PM

Abstract Keywords: Natural Language Processing, Large Language Models (LLMs), Artificial Intelligence, Information Extraction, Telemedicine, Chronic Care Management, Machine Learning, Documentation Burden
Programmatic Theme: Clinical Informatics

This study developed a hybrid speaker diarization framework for clinical telephone conversations, integrating acoustic diarization with large language model contextual validation. Applied to 70 healthcare coordination recordings, the framework improved F1 score by 3.8% (4.6% relative gain) over acoustic diarization alone. Contextual validation reduced role misattribution in acoustically ambiguous segments, while preserving sensitivity to speech. These findings suggest combining acoustic and semantic inference improves speaker attribution for downstream clinical natural language processing.

Speaker(s):
Shuxuan Li, master
University of Pennsylvania

Author(s):
Shuxuan Li, master - University of Pennsylvania; Yiheng Zhang, MA - University of Pennsylvania; Sang Bin You, MSN, RN - University of Pennsylvania; Maxim Topaz, PhD, RN - Columbia University School of Nursing; Jiyoun Song, PhD - University of Pennsylvania School of Nursing;

Using Conversational Signals for Acute Care Risk Stratification: A Multimodal AI Approach in Home Health Care

Presentation Type: Podium Abstract
Presentation Time: 03:54 PM - 04:06 PM

Abstract Keywords: Artificial Intelligence, Transitions of Care, Workflow
Programmatic Theme: Clinical Informatics

Acute care use remains common during home healthcare (HHC). We examined whether conversational signals from follow-up phone calls predict hospitalization or emergency department use within 30 and 60 days. Among 174 patients, multimodal models integrating transcript and acoustic features achieved AUROCs of 0.85 (30 days) and 0.79 (60 days). High-risk patients had markedly higher event odds than low-risk patients. Performance persisted using only the first call, supporting scalable risk stratification in HHC.

Speaker(s):
Zhihong Zhang, PhD
Columbia University

Author(s):
Pallavi Gupta, PhD - Columbia University; Maryam Zolnoori, MSc, PhD - Columbia University; Maxim Topaz, PhD - Columbia University;

MonteRET: An AI Agent Enhancing Multimodal LLMs with Multi-Granularity Knowledge Retrieval for CT Report Generation

Presentation Type: Podium Abstract
Presentation Time: 04:06 PM - 04:18 PM

Abstract Keywords: Artificial Intelligence, Natural Language Processing, Deep Learning
Working Group: Biomedical Imaging Informatics Working Group
Programmatic Theme: Public Health Informatics

MonteRET is a region-aware, knowledge-augmented AI framework for automated 3D chest CT report generation. By decomposing CT volumes into anatomical regions and employing a multi-agent retrieval architecture, MonteRET achieved a clinical micro-F1 score of 42.0% on internal testing with a 29.8% relative improvement over the baseline (P < .001), and reduced error severity in blinded expert evaluation, outperforming six state-of-the-art models.

Speaker(s):
Yi LIN, PhD
Weill Cornell Medicine

Author(s):
Yi LIN, PhD - Weill Cornell Medicine;

From Compound Figures to Composite Understanding: Developing a Multi-Modal LLM from Biomedical Literature with Medical Multiple-Image Benchmarking and Validation

Presentation Type: Podium Abstract
Presentation Time: 04:18 PM - 04:30 PM

Abstract Keywords: Large Language Models (LLMs), Natural Language Processing, Artificial Intelligence, Data Mining, Information Extraction, Evaluation, Quantitative Methods, Deep Learning
Programmatic Theme: Clinical Informatics

Existing medical multi-modal large language models (MLLMs) are limited to single-image understanding, hindering real-world clinical application. To address multi-image data scarcity, we propose a novel five-stage, context-aware instruction generation paradigm leveraging over 237,000 license-permissive compound figures from biomedical literature. Using this data, we developed M³LLM. Extensive experiments demonstrate M³LLM significantly outperforms state-of-the-art baselines on our expert-validated PMC-MI-Bench and real-world longitudinal MIMIC chest X-ray tasks.

Speaker(s):
Yihang Fu, Master
Yale University

Author(s):
Yihang Fu, Master - Yale University;

Coverage-Aware Exemplar Selection for Few-Shot Generative Modeling of Plain Knee Radiographs

Presentation Type: Paper - Student
Presentation Time: 04:30 PM - 04:42 PM

Abstract Keywords: Artificial Intelligence, Deep Learning, Machine Learning, Data Mining, Evaluation
Programmatic Theme: Clinical Informatics

Few-shot generative modeling has emerged as a promising strategy for addressing limited data availability in medical imaging. Most approaches rely on random selection of training exemplars, implicitly assuming that a small subset adequately represents the variability present in clinical imaging datasets. In radiology, however, normal knee radiographs exhibit substantial heterogeneity in anatomy, positioning, exposure, and scanner characteristics, which can cause few-shot generators to overfit narrow modes of the data distribution. In this contribution, we introduce and evaluate coverage-aware exemplar selection as an alternative to random sampling for few-shot generative modeling of plain knee radiographs. Using a frozen medical imaging foundation model to embed images into a semantic feature space, we formulate exemplar selection as coverage optimization and evaluate strategies including random sampling, k-center selection, k-means clustering, and facility location. Models trained using coverage-aware subsets demonstrate improved distributional coverage and higher recall compared with random sampling while maintaining comparable fidelity.

Speaker(s):
Nickolas Littlefield, MS, Statistics
University of Pittsburgh

Author(s):
Nickolas Littlefield, MS, Statistics - University of Pittsburgh; Dana L. Tudorascu, PhD - University of Pittsburgh; Johannes F. Plate, MD, PhD - University of Pittsburgh; Ahmad P. Tafti, PhD - University of Pittsburgh;

Custom CSS

S89: Images and Voices: Multimodal AI for Clinical Understanding (Oral Presentations)

From Compound Figures to Composite Understanding: Developing a Multi-Modal LLM from Biomedical Literature with Medical Multiple-Image Benchmarking and Validation

Category

Description

Custom CSS