Times are displayed in (UTC-04:00) Eastern Time (US & Canada) Change
3/12/2025 |
1:30 PM – 3:00 PM |
Conference A
S24: Explainable AI
Presentation Type: Podium Abstract
Session Credits: 1.5
Session Chair:
Lina Sulieman, PhD - Vanderbilt University Medical Center
Which AI Explanations Do Clinicians Prefer? A Survey on Perceptions of XAI for a Classification Task
Presentation Time: 01:30 PM - 01:45 PM
Abstract Keywords: Advanced Data Visualization Tools and Techniques, Clinical Decision Support for Translational/Data Science Interventions, Implementation Science and Deployment
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Implementation Science and Deployment in Informatics: Enabling Clinical and Translational Research
In this work, we aim to investigate clinicians’ perceptions of three different XAI tools, applied to patient clinical data, through a user-centered study involving 10 clinicians. We conduct a questionnaire-based experiment to collect clinicians’ cognitive evaluation of explanations and preferences. The main finding concerns the influence of expertise and specialty on XAI advice selection, with a general preference for SHAP, though AraucanaXAI is nearly as favored by ER doctors and more experienced clinicians.
Speaker(s):
Laura Bergomi, MEng
University of Pavia
Author(s):
Laura Bergomi, MEng - University of Pavia; Giovanna Nicora - University of Pavia; Marta A. Orlowska, BCompSc - Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy; Chiara Podrecca, MEng - Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy; Riccardo Bellazzi, PhD - University of Pavia; Caterina Fregosi, Computer Science - Department of Computer Science, Systems and Communication, University of Milano-Bicocca, Milan, Italy; Michele Catalano, Medicine and Surgery - Department of Clinical, Surgical, Diagnostic and Pediatric Sciences, University of Pavia, Pavia, Italy and Radiology Department, Fondazione IRCCS Policlinico San Matteo, Pavia, Italy; Chandra Bortolotto, Medicine and Surgery - Department of Clinical, Surgical, Diagnostic and Pediatric Sciences, University of Pavia, Pavia, Italy and Radiology Department, Fondazione IRCCS Policlinico San Matteo, Pavia, Italy; Lorenzo Preda, Medicine and Surgery - Department of Clinical, Surgical, Diagnostic and Pediatric Sciences, University of Pavia, Pavia, Italy and Radiology Department, Fondazione IRCCS Policlinico San Matteo, Pavia, Italy; Enea Parimbelli, PhD - University of Ottawa;
Presentation Time: 01:30 PM - 01:45 PM
Abstract Keywords: Advanced Data Visualization Tools and Techniques, Clinical Decision Support for Translational/Data Science Interventions, Implementation Science and Deployment
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Implementation Science and Deployment in Informatics: Enabling Clinical and Translational Research
In this work, we aim to investigate clinicians’ perceptions of three different XAI tools, applied to patient clinical data, through a user-centered study involving 10 clinicians. We conduct a questionnaire-based experiment to collect clinicians’ cognitive evaluation of explanations and preferences. The main finding concerns the influence of expertise and specialty on XAI advice selection, with a general preference for SHAP, though AraucanaXAI is nearly as favored by ER doctors and more experienced clinicians.
Speaker(s):
Laura Bergomi, MEng
University of Pavia
Author(s):
Laura Bergomi, MEng - University of Pavia; Giovanna Nicora - University of Pavia; Marta A. Orlowska, BCompSc - Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy; Chiara Podrecca, MEng - Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy; Riccardo Bellazzi, PhD - University of Pavia; Caterina Fregosi, Computer Science - Department of Computer Science, Systems and Communication, University of Milano-Bicocca, Milan, Italy; Michele Catalano, Medicine and Surgery - Department of Clinical, Surgical, Diagnostic and Pediatric Sciences, University of Pavia, Pavia, Italy and Radiology Department, Fondazione IRCCS Policlinico San Matteo, Pavia, Italy; Chandra Bortolotto, Medicine and Surgery - Department of Clinical, Surgical, Diagnostic and Pediatric Sciences, University of Pavia, Pavia, Italy and Radiology Department, Fondazione IRCCS Policlinico San Matteo, Pavia, Italy; Lorenzo Preda, Medicine and Surgery - Department of Clinical, Surgical, Diagnostic and Pediatric Sciences, University of Pavia, Pavia, Italy and Radiology Department, Fondazione IRCCS Policlinico San Matteo, Pavia, Italy; Enea Parimbelli, PhD - University of Ottawa;
Explainable Diagnosis Prediction through Neuro-Symbolic Integration
Presentation Time: 01:45 PM - 02:00 PM
Abstract Keywords: Machine Learning, Generative AI, and Predictive Modeling, Fairness and Disparity Research in Health Informatics, Clinical Decision Support for Translational/Data Science Interventions
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Health Data Science and Artificial Intelligence Innovation: From Single-Center to Multi-Site
Diagnosis prediction is a critical task in healthcare, where timely and accurate identification of medical conditions can significantly impact patient outcomes. Traditional machine learning and deep learning models have achieved notable success in this domain but often lack interpretability which is a crucial requirement in clinical settings. In this study, we explore the use of neuro-symbolic methods, specifically Logical Neural Networks (LNNs), to develop explainable models for diagnosis prediction. Essentially, we design and implement LNN-based models that integrate domain-specific knowledge through logical rules with learnable thresholds. Our models, particularly $M_{\text{multi-pathway}}$ and $M_{\text{comprehensive}}$, demonstrate superior performance over traditional models such as Logistic Regression, SVM, and Random Forest, achieving higher accuracy (up to 80.52\%) and AUROC scores (up to 0.8457) in the case study of diabetes prediction. The learned weights and thresholds within the LNN models provide direct insights into feature contributions, enhancing interpretability without compromising predictive power. These findings highlight the potential of neuro-symbolic approaches in bridging the gap between accuracy and explainability in healthcare AI applications. By offering transparent and adaptable diagnostic models, our work contributes to the advancement of precision medicine and supports the development of equitable healthcare solutions. Future research will focus on extending these methods to larger and more diverse datasets to further validate their applicability across different medical conditions and populations.
Speaker(s):
Qiuhao Lu, Ph.D.
University of Texas Health Science Center at Houston
Author(s):
Qiuhao Lu, Ph.D. - University of Texas Health Science Center at Houston; Rui Li, Phd - UT health; Elham Sagheb Hossein Pour, Master of Science - Mayo Clinic; Andrew Wen, MS - University of Texas Health Sciences Center at Houston; jinlian wang, PhD - UTHealth; Liwei Wang, MD, PhD - UTHealth; Jungwei Fan, Ph.D. - Mayo Clinic; Hongfang Liu, PhD - University of Texas Health Science Center at Houston;
Presentation Time: 01:45 PM - 02:00 PM
Abstract Keywords: Machine Learning, Generative AI, and Predictive Modeling, Fairness and Disparity Research in Health Informatics, Clinical Decision Support for Translational/Data Science Interventions
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Health Data Science and Artificial Intelligence Innovation: From Single-Center to Multi-Site
Diagnosis prediction is a critical task in healthcare, where timely and accurate identification of medical conditions can significantly impact patient outcomes. Traditional machine learning and deep learning models have achieved notable success in this domain but often lack interpretability which is a crucial requirement in clinical settings. In this study, we explore the use of neuro-symbolic methods, specifically Logical Neural Networks (LNNs), to develop explainable models for diagnosis prediction. Essentially, we design and implement LNN-based models that integrate domain-specific knowledge through logical rules with learnable thresholds. Our models, particularly $M_{\text{multi-pathway}}$ and $M_{\text{comprehensive}}$, demonstrate superior performance over traditional models such as Logistic Regression, SVM, and Random Forest, achieving higher accuracy (up to 80.52\%) and AUROC scores (up to 0.8457) in the case study of diabetes prediction. The learned weights and thresholds within the LNN models provide direct insights into feature contributions, enhancing interpretability without compromising predictive power. These findings highlight the potential of neuro-symbolic approaches in bridging the gap between accuracy and explainability in healthcare AI applications. By offering transparent and adaptable diagnostic models, our work contributes to the advancement of precision medicine and supports the development of equitable healthcare solutions. Future research will focus on extending these methods to larger and more diverse datasets to further validate their applicability across different medical conditions and populations.
Speaker(s):
Qiuhao Lu, Ph.D.
University of Texas Health Science Center at Houston
Author(s):
Qiuhao Lu, Ph.D. - University of Texas Health Science Center at Houston; Rui Li, Phd - UT health; Elham Sagheb Hossein Pour, Master of Science - Mayo Clinic; Andrew Wen, MS - University of Texas Health Sciences Center at Houston; jinlian wang, PhD - UTHealth; Liwei Wang, MD, PhD - UTHealth; Jungwei Fan, Ph.D. - Mayo Clinic; Hongfang Liu, PhD - University of Texas Health Science Center at Houston;
Explainable Artificial Intelligence (XAI) in the Era of Large Language Models: Applying an XAI Framework in Pediatric Ophthalmology Diagnosis using the Gemini Model
Presentation Time: 02:00 PM - 02:15 PM
Abstract Keywords: Machine Learning, Generative AI, and Predictive Modeling, Learning Healthcare System, Clinical Decision Support for Translational/Data Science Interventions
Primary Track: Clinical Research Informatics
Programmatic Theme: Harnessing the Power of Large Language Models in Health Data Science
Amblyopia is a neurodevelopmental disorder that compromises children's visual acuity, requiring early diagnosis for effective treatment. Traditional diagnostic methods rely on subjective evaluations of eye-tracking data by specialized pediatric ophthalmologists, often limited in low-resource settings. This creates a need for a scalable, cost-effective method to automatically analyze eye-tracking recordings. Large Language Models (LLMs) have shown potential in diagnosing amblyopia; our prior work demonstrated that Google Gemini, guided by expert ophthalmologists, can distinguish amblyopia from control subjects. However, the opaque “black-box” nature of LLMs raises transparency and trust concerns in medical contexts. To address this, we developed a Feature-Guided Interpretative Prompting (FGIP) framework focused on critical clinical features and applied the Quantus framework to evaluate the Gemini model’s outputs. Specifically, we assessed classification performance on high-fidelity eye-tracking data across faithfulness, robustness, localization, and complexity. These metrics provide insights into how the model reaches its decisions, supporting interpretability. This work represents the first systematic use of Explainable Artificial Intelligence (XAI) to examine Gemini outputs in detecting amblyopia, including cases with nystagmus. Results show that the model not only achieved high accuracy in classifying amblyopic and control participants but also maintained transparency and clinical relevance. By demonstrating that advanced AI tools can be made interpretable and aligned with expert reasoning, we highlight the potential for developing a scalable, interpretable clinical decision support (CDS) system using LLMs. This study can ultimately improve trustworthiness and broad adoption of AI-driven solutions in pediatric ophthalmology, thereby leading to better patient outcomes and ultimately bridging broader healthcare accessibility.
Speaker(s):
Dipak Upadhyaya, PhD Student
Case Western Reserve University
Author(s):
Katrina Prantzalos, MS - Case Western Reserve University; Pedram Golnari, MD - Case Western Reserve University; Satya Sahoo, PhD - Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, OH, USA; Fatema Ghasia, MD - Visual Neuroscience Laboratory, Cole Eye Institute, Cleveland Clinic, Cleveland, OH 44106, USA; Aasef Shaikh, MD - National VA Parkinson’s Consortium Center, Louis Stokes Cleveland VA Medical Center, OH 44106, USA; Subhashini Sivagnanam, NS - San Diego Supercomputer Center, University of California, San Diego, CA 92093, USA; Amitava Majumdar, PhD - San Diego Supercomputer Center, University of California, San Diego, CA 92093, USA;
Presentation Time: 02:00 PM - 02:15 PM
Abstract Keywords: Machine Learning, Generative AI, and Predictive Modeling, Learning Healthcare System, Clinical Decision Support for Translational/Data Science Interventions
Primary Track: Clinical Research Informatics
Programmatic Theme: Harnessing the Power of Large Language Models in Health Data Science
Amblyopia is a neurodevelopmental disorder that compromises children's visual acuity, requiring early diagnosis for effective treatment. Traditional diagnostic methods rely on subjective evaluations of eye-tracking data by specialized pediatric ophthalmologists, often limited in low-resource settings. This creates a need for a scalable, cost-effective method to automatically analyze eye-tracking recordings. Large Language Models (LLMs) have shown potential in diagnosing amblyopia; our prior work demonstrated that Google Gemini, guided by expert ophthalmologists, can distinguish amblyopia from control subjects. However, the opaque “black-box” nature of LLMs raises transparency and trust concerns in medical contexts. To address this, we developed a Feature-Guided Interpretative Prompting (FGIP) framework focused on critical clinical features and applied the Quantus framework to evaluate the Gemini model’s outputs. Specifically, we assessed classification performance on high-fidelity eye-tracking data across faithfulness, robustness, localization, and complexity. These metrics provide insights into how the model reaches its decisions, supporting interpretability. This work represents the first systematic use of Explainable Artificial Intelligence (XAI) to examine Gemini outputs in detecting amblyopia, including cases with nystagmus. Results show that the model not only achieved high accuracy in classifying amblyopic and control participants but also maintained transparency and clinical relevance. By demonstrating that advanced AI tools can be made interpretable and aligned with expert reasoning, we highlight the potential for developing a scalable, interpretable clinical decision support (CDS) system using LLMs. This study can ultimately improve trustworthiness and broad adoption of AI-driven solutions in pediatric ophthalmology, thereby leading to better patient outcomes and ultimately bridging broader healthcare accessibility.
Speaker(s):
Dipak Upadhyaya, PhD Student
Case Western Reserve University
Author(s):
Katrina Prantzalos, MS - Case Western Reserve University; Pedram Golnari, MD - Case Western Reserve University; Satya Sahoo, PhD - Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, OH, USA; Fatema Ghasia, MD - Visual Neuroscience Laboratory, Cole Eye Institute, Cleveland Clinic, Cleveland, OH 44106, USA; Aasef Shaikh, MD - National VA Parkinson’s Consortium Center, Louis Stokes Cleveland VA Medical Center, OH 44106, USA; Subhashini Sivagnanam, NS - San Diego Supercomputer Center, University of California, San Diego, CA 92093, USA; Amitava Majumdar, PhD - San Diego Supercomputer Center, University of California, San Diego, CA 92093, USA;
Explainable AI for Clinical Outcome Prediction: A Survey of Clinician Perceptions and Preference
Presentation Time: 02:15 PM - 02:30 PM
Abstract Keywords: Natural Language Processing, Clinical Decision Support for Translational/Data Science Interventions, Machine Learning, Generative AI, and Predictive Modeling
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Harnessing the Power of Large Language Models in Health Data Science
Explainable AI (XAI) techniques are necessary to help clinicians make sense of AI predictions and integrate predictions into their decision-making workflow. In this work, we conduct a survey study to understand clinician preference among different XAI techniques when they are used to interpret model predictions over text-based EHR data. We implement four XAI techniques (LIME, Attention-based span highlights, exemplar patient retrieval, and free-text rationales generated by LLMs) on an outcome prediction model that uses ICU admission notes to predict a patient’s likelihood of experiencing in-hospital mortality. Using these XAI implementations, we design and conduct a survey study of 32 practicing clinicians, collecting their feedback and preferences on the four techniques. We synthesize our findings into a set of recommendations describing when each of the XAI techniques may be more appropriate, their potential limitations, as well as recommendations for improvement.
Speaker(s):
Jun Hou, PhD Candidate
Virginia Tech
Author(s):
Jun Hou, PhD Candidate - Virginia Tech; Lucy Lu Wang, PhD - Allen Institute for Artificial Intelligence;
Presentation Time: 02:15 PM - 02:30 PM
Abstract Keywords: Natural Language Processing, Clinical Decision Support for Translational/Data Science Interventions, Machine Learning, Generative AI, and Predictive Modeling
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Harnessing the Power of Large Language Models in Health Data Science
Explainable AI (XAI) techniques are necessary to help clinicians make sense of AI predictions and integrate predictions into their decision-making workflow. In this work, we conduct a survey study to understand clinician preference among different XAI techniques when they are used to interpret model predictions over text-based EHR data. We implement four XAI techniques (LIME, Attention-based span highlights, exemplar patient retrieval, and free-text rationales generated by LLMs) on an outcome prediction model that uses ICU admission notes to predict a patient’s likelihood of experiencing in-hospital mortality. Using these XAI implementations, we design and conduct a survey study of 32 practicing clinicians, collecting their feedback and preferences on the four techniques. We synthesize our findings into a set of recommendations describing when each of the XAI techniques may be more appropriate, their potential limitations, as well as recommendations for improvement.
Speaker(s):
Jun Hou, PhD Candidate
Virginia Tech
Author(s):
Jun Hou, PhD Candidate - Virginia Tech; Lucy Lu Wang, PhD - Allen Institute for Artificial Intelligence;
Building Trust in Clinical AI: A Web-Based Explainable Decision Support System for Chronic Kidney Disease
Presentation Time: 02:30 PM - 02:45 PM
Abstract Keywords: Clinical Decision Support for Translational/Data Science Interventions, Informatics Research/Biomedical Informatics Research Methods, Biomedical Informatics and Data Science Workforce Education
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Health Data Science and Artificial Intelligence Innovation: From Single-Center to Multi-Site
Chronic Kidney Disease (CKD) is a significant global public health issue, affecting over 10% of the population. Timely diagnosis is crucial for effective management. Leveraging machine learning within healthcare offers promising advancements in predictive diagnostics. We developed a Web-Based Clinical Decision Support System (CDSS) for CKD, incorporating advanced Explainable AI (XAI) methods, specifically SHAP (Shapley Additive Explanations) and LIME (Local Interpretable Model-agnostic Explanations). The model employs and evaluates multiple classifiers: KNN, Random Forest, AdaBoost, XGBoost, CatBoost, and Extra Trees, to predict CKD. The effectiveness of the models is assessed by measuring their accuracy, analyzing confusion matrix statistics, and the AUC. AdaBoost achieved a 100% accuracy rate. Except for KNN, all classifiers consistently reached perfect precision and sensitivity. Additionally, we present a real-time web-based application to operationalize the model, enhancing trust and accessibility for healthcare practitioners and stakeholder.
Speaker(s):
Krishna Mridha, Phd
Case Western Reserve University
Author(s):
Presentation Time: 02:30 PM - 02:45 PM
Abstract Keywords: Clinical Decision Support for Translational/Data Science Interventions, Informatics Research/Biomedical Informatics Research Methods, Biomedical Informatics and Data Science Workforce Education
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Health Data Science and Artificial Intelligence Innovation: From Single-Center to Multi-Site
Chronic Kidney Disease (CKD) is a significant global public health issue, affecting over 10% of the population. Timely diagnosis is crucial for effective management. Leveraging machine learning within healthcare offers promising advancements in predictive diagnostics. We developed a Web-Based Clinical Decision Support System (CDSS) for CKD, incorporating advanced Explainable AI (XAI) methods, specifically SHAP (Shapley Additive Explanations) and LIME (Local Interpretable Model-agnostic Explanations). The model employs and evaluates multiple classifiers: KNN, Random Forest, AdaBoost, XGBoost, CatBoost, and Extra Trees, to predict CKD. The effectiveness of the models is assessed by measuring their accuracy, analyzing confusion matrix statistics, and the AUC. AdaBoost achieved a 100% accuracy rate. Except for KNN, all classifiers consistently reached perfect precision and sensitivity. Additionally, we present a real-time web-based application to operationalize the model, enhancing trust and accessibility for healthcare practitioners and stakeholder.
Speaker(s):
Krishna Mridha, Phd
Case Western Reserve University
Author(s):
Extraction of EHR Data using the Health Level Seven (HL7®) Fast Healthcare Interoperability Resources (FHIR®) standard in Ongoing Clinical Studies: Latest Results and Remaining Gaps
Presentation Time: 02:45 PM - 03:00 PM
Abstract Keywords: Clinical Trials Innovations, Data Quality, Data Standards, Secondary Use of EHR Data
Primary Track: Clinical Research Informatics
Programmatic Theme: Emerging Best Practices for Clinical Research Informatics Operations
Efforts to reuse healthcare data for research have persisted since the beginning of computers in medicine. The Health Level Seven (HL7®) Fast Healthcare Interoperability Resources (FHIR®) standard made doing so possible within routine study operations, but concerns over the quality of EHR data linger. While multiple recent studies have consistently observed improved data accuracy, gaps remain in data quality assessment methods, study design, and generalizability beyond demographic and lab data.
Speaker(s):
Meredith Zozus, PhD
UT Health Science Center
Author(s):
Presentation Time: 02:45 PM - 03:00 PM
Abstract Keywords: Clinical Trials Innovations, Data Quality, Data Standards, Secondary Use of EHR Data
Primary Track: Clinical Research Informatics
Programmatic Theme: Emerging Best Practices for Clinical Research Informatics Operations
Efforts to reuse healthcare data for research have persisted since the beginning of computers in medicine. The Health Level Seven (HL7®) Fast Healthcare Interoperability Resources (FHIR®) standard made doing so possible within routine study operations, but concerns over the quality of EHR data linger. While multiple recent studies have consistently observed improved data accuracy, gaps remain in data quality assessment methods, study design, and generalizability beyond demographic and lab data.
Speaker(s):
Meredith Zozus, PhD
UT Health Science Center
Author(s):