American Medical Informatics Association - Detecting Manuscripts Related to Computable Phenotypes Using a Transformer-based Language Model

Detecting Manuscripts Related to Computable Phenotypes Using a Transformer-based Language Model

Presentation Time: 05:00 PM - 06:30 PM

Abstract Keywords: Knowledge Representation and Information Modeling, Artificial Intelligence, Phenomics and Phenome-wide Association Studies, Large Language Models (LLMs)
Primary Track: Applications
Programmatic Theme: Public Health Informatics

Identifying relevant manuscripts for phenomics knowledgebases is a complex and time-consuming task. We developed a Transformer-based language model using a fine-tuned BioBERT model to detect manuscripts related to computable phenotypes. To address BioBERT’s 512-token limit, we introduced a sliding-window method, segmenting documents into multiple segments and aggregating classification scores. Our model significantly outperformed the default approach (AUC: 0.99 vs. 0.83, Accuracy: 0.95 vs. 0.72). This method enhances automated phenotyping literature identification, improving knowledgebase development efficiency.

Speaker(s):
Junghoon Chae, PhD
Oak Ridge National Laboratory

Author(s):
Junghoon Chae, PhD - Oak Ridge National Laboratory; David Heise; Keith Connatser; Jacqueline Honerlaw, RN, MPH - VA Boston Healthcare System; Monika Maripuri, MBBS, MPH - VA Boston Healthcare System; Yuk-Lam Ho, MPH - VA Boston Healthcare System; Kelly Cho, PhD - VA Boston Healthcare/Harvard Medical School;

Detecting Manuscripts Related to Computable Phenotypes Using a Transformer-based Language Model

Description

Date: Tuesday (11/18)
Time: 05:00 PM to 06:30 PM
Room: Room 13

Back to Speaker Gallery

Custom CSS

Detecting Manuscripts Related to Computable Phenotypes Using a Transformer-based Language Model

Category

Description

Custom CSS