Evaluation of Recommender Systems for Phenotypic Concept Tagging of Clinical Free-Text Descriptions
Presentation Time: 10:45 AM - 11:00 AM
Abstract Keywords: Bioinformatics, Controlled Terminologies, Ontologies, and Vocabularies, Large Language Models (LLMs), Natural Language Processing
Primary Track: Applications
Programmatic Theme: Clinical Research Informatics
Mapping biomedical descriptions to a standard vocabulary system may yield larger, more representative patient populations, resulting in better powered, more generalizable studies. Unfortunately, the manual mapping process, while high fidelity, is laborious and time-consuming. This research seeks to evaluate time benefits of varying recommender systems for biomedical concept tagging versus such a manual review process. The systems comprised OpenAI embeddings, PubMedBERT embeddings, and utilizing the UMLS API. All recommender systems tested were found to provide time savings over manual mapping efforts, with varying levels of precision across the systems tested (best: 79% OpenAI embeddings). These results establish an empirical data context for researchers and project managers who seek to enrich phenotypes with unstructured data in resource-scarce scenarios.
Speaker(s):
Justin Mower, PhD
Regeneron Pharmaceuticals, Inc.
Author(s):
Amelia Averitt, MPH, MA, PhD - Regeneron Pharmaceuticals; Justin Mower, PhD - Regeneron Pharmaceuticals, Inc.; Miriam Nwaru, MS - Regeneron Pharmaceuticals, Inc.; Edward Olszewski, BSN, MHI - Regeneron Pharmaceuticals, Inc.; Deepika Sharma, MHI - Regeneron Pharmaceuticals, Inc.; Nilanjana Banerjee, PhD - Regeneron Pharmaceuticals, Inc.; Michael Cantor, MA, MD - Regeneron Pharmaceuticals, Inc.;
Presentation Time: 10:45 AM - 11:00 AM
Abstract Keywords: Bioinformatics, Controlled Terminologies, Ontologies, and Vocabularies, Large Language Models (LLMs), Natural Language Processing
Primary Track: Applications
Programmatic Theme: Clinical Research Informatics
Mapping biomedical descriptions to a standard vocabulary system may yield larger, more representative patient populations, resulting in better powered, more generalizable studies. Unfortunately, the manual mapping process, while high fidelity, is laborious and time-consuming. This research seeks to evaluate time benefits of varying recommender systems for biomedical concept tagging versus such a manual review process. The systems comprised OpenAI embeddings, PubMedBERT embeddings, and utilizing the UMLS API. All recommender systems tested were found to provide time savings over manual mapping efforts, with varying levels of precision across the systems tested (best: 79% OpenAI embeddings). These results establish an empirical data context for researchers and project managers who seek to enrich phenotypes with unstructured data in resource-scarce scenarios.
Speaker(s):
Justin Mower, PhD
Regeneron Pharmaceuticals, Inc.
Author(s):
Amelia Averitt, MPH, MA, PhD - Regeneron Pharmaceuticals; Justin Mower, PhD - Regeneron Pharmaceuticals, Inc.; Miriam Nwaru, MS - Regeneron Pharmaceuticals, Inc.; Edward Olszewski, BSN, MHI - Regeneron Pharmaceuticals, Inc.; Deepika Sharma, MHI - Regeneron Pharmaceuticals, Inc.; Nilanjana Banerjee, PhD - Regeneron Pharmaceuticals, Inc.; Michael Cantor, MA, MD - Regeneron Pharmaceuticals, Inc.;
Evaluation of Recommender Systems for Phenotypic Concept Tagging of Clinical Free-Text Descriptions
Category
Podium Abstract
Description
Date: Monday (11/11)
Time: 10:45 AM to 11:00 AM
Room: Continental Ballroom 8-9
Time: 10:45 AM to 11:00 AM
Room: Continental Ballroom 8-9