Times are displayed in (UTC-04:00) Eastern Time (US & Canada) Change
3/13/2025 |
10:00 AM – 11:30 AM |
Monongahela
S35: Multimodal Analytics
Presentation Type: Podium Abstract
Session Credits: 1.5
Session Chair:
Caitlin Dreisbach, PhD, RN - University of Rochester
Toward Automated Clinical Transcriptions
Presentation Time: 10:00 AM - 10:15 AM
Abstract Keywords: Clinical and Research Data Collection, Curation, Preservation, or Sharing, Machine Learning, Generative AI, and Predictive Modeling, Natural Language Processing, Data/System Integration, Standardization and Interoperability
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Implementation Science and Deployment in Informatics: Enabling Clinical and Translational Research
Administrative documentation is a major driver of rising healthcare costs and is linked to adverse outcomes, including physician burnout and diminished quality of care. This paper introduces a secure system that applies recent advancements in speech-to-text transcription and speaker-labeling (diarization) to patient-provider conversations. Applied to over 40 hours of simulated conversations, this system offers a promising foundation for automating clinical transcriptions.
Speaker(s):
William Logan, B.S. in Computer Engineering
UKY
Author(s):
Mitchell Klusty, B.S. Computer Science - University of Kentucky; Vaiden Logan, B.S. in Computer Engineering - UKY; Samuel Armstrong, MS - University of Kentucky; Aaron Mullen, B.S. - University of Kentucky; Caroline Leach, BS - University of Kentucky; Jeffery Talbert, PhD - University of Kentucky; Cody Bumgardner, PhD - University of Kentucky;
Presentation Time: 10:00 AM - 10:15 AM
Abstract Keywords: Clinical and Research Data Collection, Curation, Preservation, or Sharing, Machine Learning, Generative AI, and Predictive Modeling, Natural Language Processing, Data/System Integration, Standardization and Interoperability
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Implementation Science and Deployment in Informatics: Enabling Clinical and Translational Research
Administrative documentation is a major driver of rising healthcare costs and is linked to adverse outcomes, including physician burnout and diminished quality of care. This paper introduces a secure system that applies recent advancements in speech-to-text transcription and speaker-labeling (diarization) to patient-provider conversations. Applied to over 40 hours of simulated conversations, this system offers a promising foundation for automating clinical transcriptions.
Speaker(s):
William Logan, B.S. in Computer Engineering
UKY
Author(s):
Mitchell Klusty, B.S. Computer Science - University of Kentucky; Vaiden Logan, B.S. in Computer Engineering - UKY; Samuel Armstrong, MS - University of Kentucky; Aaron Mullen, B.S. - University of Kentucky; Caroline Leach, BS - University of Kentucky; Jeffery Talbert, PhD - University of Kentucky; Cody Bumgardner, PhD - University of Kentucky;
MedVidDeID: Protecting Privacy in Clinical Encounter Video Recordings
Presentation Time: 10:15 AM - 10:30 AM
Abstract Keywords: Clinical and Research Data Collection, Curation, Preservation, or Sharing, Data Security and Privacy, Medical Imaging, Data Sharing/Interoperability
Primary Track: Clinical Research Informatics
Programmatic Theme: Emerging Best Practices for Clinical Research Informatics Operations
Audio/video (AV) healthcare data are increasingly valuable for ethnographic research and multimodal AI models. However, removing sensitive AV data poses a significant privacy challenge. This work presents a modular pipeline for de-identifying AV data, employing open-source tools like WhisperX and YOLOv8. Our results demonstrate a success rate of 99.52% and 97.25% for named-entity removal and video obfuscation, respectively. Future improvements will focus on enhancing transcription, tracking accuracy, and automating quality control.
Speaker(s):
Sriharsha Mopidevi, Master of Science
University of Pennsylvania
Author(s):
Sriharsha Mopidevi, Master of Science - University of Pennsylvania; Kuk Jin Jang, PhD - University of Pennsylvania; Basam Alasaly, Biomedical Informatics, M.S. - Perelman School of Medicine at the University of Pennsylvania; Lauren Malloy, BA - University of Pennsylvania; Eric Eaton, PhD - University of Pennsylvania; Kevin Johnson, MD, MS - University of Pennsylvania;
Presentation Time: 10:15 AM - 10:30 AM
Abstract Keywords: Clinical and Research Data Collection, Curation, Preservation, or Sharing, Data Security and Privacy, Medical Imaging, Data Sharing/Interoperability
Primary Track: Clinical Research Informatics
Programmatic Theme: Emerging Best Practices for Clinical Research Informatics Operations
Audio/video (AV) healthcare data are increasingly valuable for ethnographic research and multimodal AI models. However, removing sensitive AV data poses a significant privacy challenge. This work presents a modular pipeline for de-identifying AV data, employing open-source tools like WhisperX and YOLOv8. Our results demonstrate a success rate of 99.52% and 97.25% for named-entity removal and video obfuscation, respectively. Future improvements will focus on enhancing transcription, tracking accuracy, and automating quality control.
Speaker(s):
Sriharsha Mopidevi, Master of Science
University of Pennsylvania
Author(s):
Sriharsha Mopidevi, Master of Science - University of Pennsylvania; Kuk Jin Jang, PhD - University of Pennsylvania; Basam Alasaly, Biomedical Informatics, M.S. - Perelman School of Medicine at the University of Pennsylvania; Lauren Malloy, BA - University of Pennsylvania; Eric Eaton, PhD - University of Pennsylvania; Kevin Johnson, MD, MS - University of Pennsylvania;
Evaluating the Reliability and Fairness of Two Multimodal Large Language Models in Identifying Skin Diseases from AI-generated Human Face Images
Presentation Time: 10:30 AM - 10:45 AM
Abstract Keywords: Fairness and Disparity Research in Health Informatics, Machine Learning, Generative AI, and Predictive Modeling, Natural Language Processing
Working Group: Natural Language Processing Working Group
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Harnessing the Power of Large Language Models in Health Data Science
This study evaluates the reliability and fairness of two large language models, ChatGPT-4 and LLaVA 1.6, in identifying skin diseases from both image and text inputs. Using a synthetic dataset generated by DALL-E 3, the models were compared with state-of-the-art machine learning models. Results show that the LLMs outperform baselines in diagnostic accuracy. While some bias was found in age groups, overall, the LLMs demonstrate potential for fair and accurate remote diagnosis support.
Speaker(s):
Shunxing Bao, Ph.D.
Vanderbilt University
Author(s):
Yuhang Guo, BS - ShanghaiTech University; Shunxing Bao, PhD - Vanderbilt University; Bradley Malin, PhD - Vanderbilt University Medical Center; Zhiyu Wan, PhD - Vanderbilt University Medical Center;
Presentation Time: 10:30 AM - 10:45 AM
Abstract Keywords: Fairness and Disparity Research in Health Informatics, Machine Learning, Generative AI, and Predictive Modeling, Natural Language Processing
Working Group: Natural Language Processing Working Group
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Harnessing the Power of Large Language Models in Health Data Science
This study evaluates the reliability and fairness of two large language models, ChatGPT-4 and LLaVA 1.6, in identifying skin diseases from both image and text inputs. Using a synthetic dataset generated by DALL-E 3, the models were compared with state-of-the-art machine learning models. Results show that the LLMs outperform baselines in diagnostic accuracy. While some bias was found in age groups, overall, the LLMs demonstrate potential for fair and accurate remote diagnosis support.
Speaker(s):
Shunxing Bao, Ph.D.
Vanderbilt University
Author(s):
Yuhang Guo, BS - ShanghaiTech University; Shunxing Bao, PhD - Vanderbilt University; Bradley Malin, PhD - Vanderbilt University Medical Center; Zhiyu Wan, PhD - Vanderbilt University Medical Center;
Depression prediction using machine learning with clinical and audio data from the Bridge2AI Voice Data Project
Presentation Time: 10:45 AM - 11:00 AM
Abstract Keywords: Biomarker Discovery and Development, Machine Learning, Generative AI, and Predictive Modeling, Proactive Machine Learning and Reinforcement Learning
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Health Data Science and Artificial Intelligence Innovation: From Single-Center to Multi-Site
The Bridge2AI-Voice project collected audio and clinical data from patients across 4 academic medical institutions. Here, 113 audio and clinical features were extracted from 105 patients for machine learning depression prediction. The XGBoost classifier achieved an AUROC of 0.926, while Shapley analysis of features revealed that MFCC3 and F0 can be predictive of depression. Overall, strong model performance and biomarker identification show the utility of machine learning for depression prediction from audio data.
Speaker(s):
William Powell, MS
Washington University in St. Louis
Author(s):
Isaac Kyeremateng, MD, MPH - Washington University in St. Louis; Zachary Abrams, PhD - Institute for Informatics at Washington University School of Medicine in St. Louis;
Presentation Time: 10:45 AM - 11:00 AM
Abstract Keywords: Biomarker Discovery and Development, Machine Learning, Generative AI, and Predictive Modeling, Proactive Machine Learning and Reinforcement Learning
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Health Data Science and Artificial Intelligence Innovation: From Single-Center to Multi-Site
The Bridge2AI-Voice project collected audio and clinical data from patients across 4 academic medical institutions. Here, 113 audio and clinical features were extracted from 105 patients for machine learning depression prediction. The XGBoost classifier achieved an AUROC of 0.926, while Shapley analysis of features revealed that MFCC3 and F0 can be predictive of depression. Overall, strong model performance and biomarker identification show the utility of machine learning for depression prediction from audio data.
Speaker(s):
William Powell, MS
Washington University in St. Louis
Author(s):
Isaac Kyeremateng, MD, MPH - Washington University in St. Louis; Zachary Abrams, PhD - Institute for Informatics at Washington University School of Medicine in St. Louis;
Acoustic Analysis-based Machine Learning Approaches to Screening for Aspiration Risk in Older Adults with Swallowing Dysfunction
Presentation Time: 11:00 AM - 11:15 AM
Abstract Keywords: Clinical Decision Support for Translational/Data Science Interventions, Biomarker Discovery and Development, Machine Learning, Generative AI, and Predictive Modeling
Primary Track: Clinical Research Informatics
Programmatic Theme: Digital Health Technologies for Patient Research
Swallowing dysfunction is common in older adults and is associated with an increased risk of aspiration pneumonia. Gold standard diagnostic tools require materials and personnel that are not readily accessible to most of the population. This study aims to use various machine learning (ML) techniques to predict aspiration risk through acoustic analysis of voice sounds before and after swallowing water. Of the ML models tested, the convolutional neural network (CNN) had the best performance, with a sensitivity of 78% and a specificity of 90%.
Speaker(s):
Anaïs Rameau, MD
Weill Cornell Medical College
Author(s):
Presentation Time: 11:00 AM - 11:15 AM
Abstract Keywords: Clinical Decision Support for Translational/Data Science Interventions, Biomarker Discovery and Development, Machine Learning, Generative AI, and Predictive Modeling
Primary Track: Clinical Research Informatics
Programmatic Theme: Digital Health Technologies for Patient Research
Swallowing dysfunction is common in older adults and is associated with an increased risk of aspiration pneumonia. Gold standard diagnostic tools require materials and personnel that are not readily accessible to most of the population. This study aims to use various machine learning (ML) techniques to predict aspiration risk through acoustic analysis of voice sounds before and after swallowing water. Of the ML models tested, the convolutional neural network (CNN) had the best performance, with a sensitivity of 78% and a specificity of 90%.
Speaker(s):
Anaïs Rameau, MD
Weill Cornell Medical College
Author(s):
A Positionally Encoded Transformer for Monitoring Health Contexts of Hajj Pilgrims from Wearable Sensor Data
Presentation Time: 11:15 AM - 11:30 AM
Abstract Keywords: Mobile Health, Wearable Devices and Patient-Generated Health Data, Data-Driven Research and Discovery, Learning Healthcare System, Clinical Decision Support for Translational/Data Science Interventions
Working Group: Student Working Group
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Digital Health Technologies for Patient Research
Monitoring the health of individuals during physically demanding tasks, such as the Hajj pilgrimage, requires robust methods for real-time detection of health-relevant contexts, including physical tiredness, emotional mood, and activity type. This paper introduces a positionally encoded Transformer model designed to detect these contexts from time-series data collected via wearable sensors. The model leverages Long Short-Term Memory (LSTM) for feature extraction and Transformer layers for context classification, utilizing positional encoding to capture the sequential dependencies within the sensor data. Our experiments, using data from 19 participants, show that the proposed model achieves high classification accuracy across multiple health-relevant contexts, significantly improving real-time health monitoring.
Speaker(s):
Nazim Belabbaci, PhD
University of Massachusetts Lowell
Author(s):
Nazim Belabbaci, PhD - University of Massachusetts Lowell; Raphael Anaadumba, PhD - University of Massachussets Lowell; Mohammad Arif Ul Alam, Assistant Professor/PhD - University of Massachusetts Lowell;
Presentation Time: 11:15 AM - 11:30 AM
Abstract Keywords: Mobile Health, Wearable Devices and Patient-Generated Health Data, Data-Driven Research and Discovery, Learning Healthcare System, Clinical Decision Support for Translational/Data Science Interventions
Working Group: Student Working Group
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Digital Health Technologies for Patient Research
Monitoring the health of individuals during physically demanding tasks, such as the Hajj pilgrimage, requires robust methods for real-time detection of health-relevant contexts, including physical tiredness, emotional mood, and activity type. This paper introduces a positionally encoded Transformer model designed to detect these contexts from time-series data collected via wearable sensors. The model leverages Long Short-Term Memory (LSTM) for feature extraction and Transformer layers for context classification, utilizing positional encoding to capture the sequential dependencies within the sensor data. Our experiments, using data from 19 participants, show that the proposed model achieves high classification accuracy across multiple health-relevant contexts, significantly improving real-time health monitoring.
Speaker(s):
Nazim Belabbaci, PhD
University of Massachusetts Lowell
Author(s):
Nazim Belabbaci, PhD - University of Massachusetts Lowell; Raphael Anaadumba, PhD - University of Massachussets Lowell; Mohammad Arif Ul Alam, Assistant Professor/PhD - University of Massachusetts Lowell;