Combining Rule-based NLP-lite with Rapid Iterative Chart Adjudication for Creation of a Large, Accurately Curated Cohort from EHR data: A Case Study in the Context of a Clinical Trial Emulation
Presentation Time: 04:00 PM - 04:15 PM
Abstract Keywords: Natural Language Processing, Causal Inference, Information Extraction, Internal Medicine or Medical Subspecialty, Informatics Implementation, Human-computer Interaction, Knowledge Representation and Information Modeling, Large Language Models (LLMs)
Primary Track: Applications
Programmatic Theme: Clinical Research Informatics
The aim of this work was to create a gold-standard curated cohort of ~10,000 cases from the Veteran Affairs corporate data warehouse for virtual emulation of a randomized clinical trial (CSP#592). The trial had six inclusion/exclusion criteria lacking adequate structured data. We therefore used a hybrid computer/human approach to extract information from clinical notes. Rule-based NLP output was iteratively adjudicated by a panel of trained non-clinician content experts and non-experts using an easy-to-use spreadsheet-based rapid adjudication display. This group-adjudication process iteratively sharpened both the computer algorithm and clinical decision criteria, while simultaneously training the non-experts. The cohort was successfully created with each inclusion/exclusion decision backed by a source document. Less than 0.5% of cases required referral to specialist clinicians. It is likely that such curated datasets capturing specialist reasoning and using a process-supervised approach will acquire greater importance as training tools for future clinical AI applications.
Speaker(s):
Pradeep Mutalik, MD
Yale University School of Medicine
Author(s):
Kei-Hoi Cheung, PhD - Biomedical Informatics and Data Science; Jennifer Green, BA - VA Portland Health Care System; Melissa Buelt-Gebhardt, PhD, ACRP-CP - VA Minneapolis Health Care System; Karen Anderson, BA - Yale University School of Medicine; Vales JeanPaul, MSHS, MBA/HCM - VA Connecticut Health Care System; Linda McDonald, BS, RN - Cooperative Studies Program Coordinating Center, VA Connecticut Health Care Center; Michael Wininger, PhD - Yale University School of Medicine; Yuli Li, MS - VA Cooperative Studies Program Clinical Epidemiology Research Center, VA Connecticut Health Care System; Nallakkandi Rajeevan, PhD - VA Cooperative Studies Program Clinical Epidemiology Research Center, VA Connecticut Health Care System; Peter Jessel, MD - VA Portland Health Care System; Hans Moore, MD, FHRS - VA Washington DC Health Care; Selçuk Adabag, MD - Minneapolis; Merritt Raitt, MD - VA Portland Health Care System; Mihaela Aslan, PhD - VA Cooperative Studies Program Clinical Epidemiology Research Center;
Presentation Time: 04:00 PM - 04:15 PM
Abstract Keywords: Natural Language Processing, Causal Inference, Information Extraction, Internal Medicine or Medical Subspecialty, Informatics Implementation, Human-computer Interaction, Knowledge Representation and Information Modeling, Large Language Models (LLMs)
Primary Track: Applications
Programmatic Theme: Clinical Research Informatics
The aim of this work was to create a gold-standard curated cohort of ~10,000 cases from the Veteran Affairs corporate data warehouse for virtual emulation of a randomized clinical trial (CSP#592). The trial had six inclusion/exclusion criteria lacking adequate structured data. We therefore used a hybrid computer/human approach to extract information from clinical notes. Rule-based NLP output was iteratively adjudicated by a panel of trained non-clinician content experts and non-experts using an easy-to-use spreadsheet-based rapid adjudication display. This group-adjudication process iteratively sharpened both the computer algorithm and clinical decision criteria, while simultaneously training the non-experts. The cohort was successfully created with each inclusion/exclusion decision backed by a source document. Less than 0.5% of cases required referral to specialist clinicians. It is likely that such curated datasets capturing specialist reasoning and using a process-supervised approach will acquire greater importance as training tools for future clinical AI applications.
Speaker(s):
Pradeep Mutalik, MD
Yale University School of Medicine
Author(s):
Kei-Hoi Cheung, PhD - Biomedical Informatics and Data Science; Jennifer Green, BA - VA Portland Health Care System; Melissa Buelt-Gebhardt, PhD, ACRP-CP - VA Minneapolis Health Care System; Karen Anderson, BA - Yale University School of Medicine; Vales JeanPaul, MSHS, MBA/HCM - VA Connecticut Health Care System; Linda McDonald, BS, RN - Cooperative Studies Program Coordinating Center, VA Connecticut Health Care Center; Michael Wininger, PhD - Yale University School of Medicine; Yuli Li, MS - VA Cooperative Studies Program Clinical Epidemiology Research Center, VA Connecticut Health Care System; Nallakkandi Rajeevan, PhD - VA Cooperative Studies Program Clinical Epidemiology Research Center, VA Connecticut Health Care System; Peter Jessel, MD - VA Portland Health Care System; Hans Moore, MD, FHRS - VA Washington DC Health Care; Selçuk Adabag, MD - Minneapolis; Merritt Raitt, MD - VA Portland Health Care System; Mihaela Aslan, PhD - VA Cooperative Studies Program Clinical Epidemiology Research Center;
Combining Rule-based NLP-lite with Rapid Iterative Chart Adjudication for Creation of a Large, Accurately Curated Cohort from EHR data: A Case Study in the Context of a Clinical Trial Emulation
Category
Paper - Regular
Description
Date: Monday (11/11)
Time: 04:00 PM to 04:15 PM
Room: Continental Ballroom 1-2
Time: 04:00 PM to 04:15 PM
Room: Continental Ballroom 1-2