Enhancing Causes of Death Prediction from Electronic Health Records through Multi-Modal Integration of Structured and Unstructured EHR Data
Presentation Time: 09:00 AM - 09:15 AM
Abstract Keywords: Natural Language Processing, Machine Learning, Knowledge Representation and Information Modeling, Large Language Models (LLMs), Data Mining, Informatics Implementation, Bioinformatics, Population Health
Primary Track: Applications
Programmatic Theme: Academic Informatics / LIEAF
This study demonstrates the importance of integrating heterogeneous EHR data to enhance mortality prediction accuracy. The proposed framework leverages the complementary predictive strengths of structured data elements and unstructured clinical text. By integrating patient-level embeddings generated from clinical notes, the model performance was improved, achieving a 5% increase in F-measure and a 4% lift in AUC over using structured data alone. These results highlight the significance of multi-modal modeling approaches that integrate insights from both structured and unstructured EHR data to achieve performance gains. In the context of specific CoD, combining unstructured notes with structured data in multi-modal modeling enhanced mortality predictions for 12 out of 15 CoDs, better than the performance achieved with only structured data. The significant gains were for less common conditions like cerebrovascular disease, essential hypertension, intentional self-harm, and Alzheimer's. This suggests unstructured notes contain signals that can enhance performance for classes with fewer samples, as structured data alone may underestimate minority CoD. This approach can potentially enrich epidemiological research and contribute to developing improved healthcare policies and practices. This methodology faces challenges of computational complexity, resource demands from vast unstructured data, and limitations from data quality that might lead to poor generalization, particularly in healthcare with constrained computational resources.
Speaker(s):
Mohammed Al-Garadi, PhD
VUMC
Author(s):
Mohammed Al-Garadi, PhD - VUMC; Ruth Reeves - Tennessee Valley Health Care System, US Veterans' Affairs; Rishi Desai - Brigham and Women's Hospital; Michele LeNoue-Newton; Daniel Park, M.S - VUMC; Shirley Wang - Harvard Medical School/Brigham & Women's; Judith Maro, PhD - Harvard Pilgrim Health Care Institute and Department of Population Medicine, Harvard Medical School, Boston, MA; Candace Fuller, PhD - Harvard Pilgrim Health Care Institute and Department of Population Medicine, Harvard Medical School, Boston, MA, USA; Joshua Lin, PhD - Division of Pharmacoepidemiology and Pharmacoeconomics at the Brigham and Women’s Hospital; José Hernández-Muñoz, PhD - Food and Drug Administration, Silver Spring, MD; Aida Kuzucan, PhD - Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD; Xi Wang, PhD - Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD; Haritha Pillai, M.S - Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts, USA; Kerry Ngan, MCS - Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts, USA; Melissa McPheeters, PhD, MPH - RTI International; Jill Whitaker, MSN, RN-BC - VUMC; Michael Matheny, MD, MS, MPH, FACMI, FAMIA - Vanderbilt University Medical Center;
Presentation Time: 09:00 AM - 09:15 AM
Abstract Keywords: Natural Language Processing, Machine Learning, Knowledge Representation and Information Modeling, Large Language Models (LLMs), Data Mining, Informatics Implementation, Bioinformatics, Population Health
Primary Track: Applications
Programmatic Theme: Academic Informatics / LIEAF
This study demonstrates the importance of integrating heterogeneous EHR data to enhance mortality prediction accuracy. The proposed framework leverages the complementary predictive strengths of structured data elements and unstructured clinical text. By integrating patient-level embeddings generated from clinical notes, the model performance was improved, achieving a 5% increase in F-measure and a 4% lift in AUC over using structured data alone. These results highlight the significance of multi-modal modeling approaches that integrate insights from both structured and unstructured EHR data to achieve performance gains. In the context of specific CoD, combining unstructured notes with structured data in multi-modal modeling enhanced mortality predictions for 12 out of 15 CoDs, better than the performance achieved with only structured data. The significant gains were for less common conditions like cerebrovascular disease, essential hypertension, intentional self-harm, and Alzheimer's. This suggests unstructured notes contain signals that can enhance performance for classes with fewer samples, as structured data alone may underestimate minority CoD. This approach can potentially enrich epidemiological research and contribute to developing improved healthcare policies and practices. This methodology faces challenges of computational complexity, resource demands from vast unstructured data, and limitations from data quality that might lead to poor generalization, particularly in healthcare with constrained computational resources.
Speaker(s):
Mohammed Al-Garadi, PhD
VUMC
Author(s):
Mohammed Al-Garadi, PhD - VUMC; Ruth Reeves - Tennessee Valley Health Care System, US Veterans' Affairs; Rishi Desai - Brigham and Women's Hospital; Michele LeNoue-Newton; Daniel Park, M.S - VUMC; Shirley Wang - Harvard Medical School/Brigham & Women's; Judith Maro, PhD - Harvard Pilgrim Health Care Institute and Department of Population Medicine, Harvard Medical School, Boston, MA; Candace Fuller, PhD - Harvard Pilgrim Health Care Institute and Department of Population Medicine, Harvard Medical School, Boston, MA, USA; Joshua Lin, PhD - Division of Pharmacoepidemiology and Pharmacoeconomics at the Brigham and Women’s Hospital; José Hernández-Muñoz, PhD - Food and Drug Administration, Silver Spring, MD; Aida Kuzucan, PhD - Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD; Xi Wang, PhD - Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD; Haritha Pillai, M.S - Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts, USA; Kerry Ngan, MCS - Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts, USA; Melissa McPheeters, PhD, MPH - RTI International; Jill Whitaker, MSN, RN-BC - VUMC; Michael Matheny, MD, MS, MPH, FACMI, FAMIA - Vanderbilt University Medical Center;
Enhancing Causes of Death Prediction from Electronic Health Records through Multi-Modal Integration of Structured and Unstructured EHR Data
Category
Podium Abstract