Times are displayed in (UTC-04:00) Eastern Time (US & Canada) Change
3/12/2025 |
3:30 PM – 5:00 PM |
Frick
S31: Invited Session: AI in Oncology
Presentation Type: Podium Abstract
Session Credits: 1.5
HemOnc.org and Machine Learning Use Cases
Presentation Time: 03:30 PM - 03:45 PM
Abstract Keywords: Knowledge Representation, Management, or Engineering, Data Standards, Data-Driven Research and Discovery
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Real-World Evidence in Informatics: Bridging the Gap between Research and Practice
HemOnc.org is a free, collaborative wiki resource for hematology and oncology professionals. It provides detailed information on anticancer drugs, treatment regimens, guidelines, and patient resources. In the rapidly evolving field of cancer treatment, staying abreast of the latest research and advancements is crucial for both healthcare professionals and researchers. Machine learning has emerged as a powerful tool with the potential to revolutionize cancer research and improve patient outcomes. By applying machine learning techniques to analyze complex datasets and extract meaningful insights, we can enhance our understanding of cancer biology, optimize treatment strategies, and personalize patient care. This proposal outlines the potential of applying machine learning techniques to HemOnc.org data for three specific use cases: information-theoretic network meta-analysis, social network analysis, and ground truth for real-world evidence studies. Each use case will be described in terms of its methodology, potential benefits and challenges, and expected outcomes.
Speaker(s):
Jeremy Warner, MD, MS
Brown University
Author(s):
Jeremy Warner, MD, MS - Brown University;
Presentation Time: 03:30 PM - 03:45 PM
Abstract Keywords: Knowledge Representation, Management, or Engineering, Data Standards, Data-Driven Research and Discovery
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Real-World Evidence in Informatics: Bridging the Gap between Research and Practice
HemOnc.org is a free, collaborative wiki resource for hematology and oncology professionals. It provides detailed information on anticancer drugs, treatment regimens, guidelines, and patient resources. In the rapidly evolving field of cancer treatment, staying abreast of the latest research and advancements is crucial for both healthcare professionals and researchers. Machine learning has emerged as a powerful tool with the potential to revolutionize cancer research and improve patient outcomes. By applying machine learning techniques to analyze complex datasets and extract meaningful insights, we can enhance our understanding of cancer biology, optimize treatment strategies, and personalize patient care. This proposal outlines the potential of applying machine learning techniques to HemOnc.org data for three specific use cases: information-theoretic network meta-analysis, social network analysis, and ground truth for real-world evidence studies. Each use case will be described in terms of its methodology, potential benefits and challenges, and expected outcomes.
Speaker(s):
Jeremy Warner, MD, MS
Brown University
Author(s):
Jeremy Warner, MD, MS - Brown University;
Extracting Knowledge from Scientific Texts on Patient-Derived Cancer Models Using Large Language Models
Presentation Time: 03:45 PM - 04:00 PM
Abstract Keywords: Machine Learning, Generative AI, and Predictive Modeling, Natural Language Processing, Clinical and Research Data Collection, Curation, Preservation, or Sharing
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Harnessing the Power of Large Language Models in Health Data Science
Patient-derived cancer models (PDCMs) emerged as indispensable tools in cancer research and preclinical studies. Developments in Artificial Intelligence, particularly Large Language Models (LLMs), hold promise for extracting knowledge from scientific texts. In this work, we applied recent advancements in LLMs to automatically extract PDCM-relevant entities from scientific texts. We explore direct and novel soft prompting to show it is possible to achieve the performance of proprietary LLMs by training soft prompts with smaller open models.
Speaker(s):
Guergana Savova, PhD
Boston Children's Hospital and Harvard Medical School
Author(s):
Jiarui Yao, Phd - Bosthon Children's Hospital/Harvard Medical School; Zinaida Perova - EMBL-EBI; Tushar Mandloi, MS - European Molecular Biology Laboratory - EBI; Elizabeth Lewis, MS - European Molecular Biology Laboratory-European Bioinformatics Institute; Helen Parkinson, PhD - European Molecular Biology Laboratory-European Bioinformatics Institute; Guergana Savova, PhD - Boston Children's Hospital and Harvard Medical School;
Presentation Time: 03:45 PM - 04:00 PM
Abstract Keywords: Machine Learning, Generative AI, and Predictive Modeling, Natural Language Processing, Clinical and Research Data Collection, Curation, Preservation, or Sharing
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Harnessing the Power of Large Language Models in Health Data Science
Patient-derived cancer models (PDCMs) emerged as indispensable tools in cancer research and preclinical studies. Developments in Artificial Intelligence, particularly Large Language Models (LLMs), hold promise for extracting knowledge from scientific texts. In this work, we applied recent advancements in LLMs to automatically extract PDCM-relevant entities from scientific texts. We explore direct and novel soft prompting to show it is possible to achieve the performance of proprietary LLMs by training soft prompts with smaller open models.
Speaker(s):
Guergana Savova, PhD
Boston Children's Hospital and Harvard Medical School
Author(s):
Jiarui Yao, Phd - Bosthon Children's Hospital/Harvard Medical School; Zinaida Perova - EMBL-EBI; Tushar Mandloi, MS - European Molecular Biology Laboratory - EBI; Elizabeth Lewis, MS - European Molecular Biology Laboratory-European Bioinformatics Institute; Helen Parkinson, PhD - European Molecular Biology Laboratory-European Bioinformatics Institute; Guergana Savova, PhD - Boston Children's Hospital and Harvard Medical School;
Advancing clinical trial education with generative large language models
Presentation Time: 04:00 PM - 04:15 PM
Abstract Keywords: Clinical Trials Innovations, Machine Learning, Generative AI, and Predictive Modeling, Health Literacy Issues and Solutions
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Digital Health Technologies for Patient Research
Introduction: This study investigated the use of Large Language Models (LLMs) to generate patient-friendly educational materials for clinical trials. Informed consent forms (ICFs) from cancer clinical trials were used as input for GPT-4, which was prompted to create both concise summaries and multiple-choice question-answer pairs (MCQAs).
Methods: Two methods were used for summary generation: direct summarization and sequential extraction/summarization. Clinicians evaluated summary quality, while patients assessed readability and usefulness. For MCQAs, GPT-4 was prompted using in-context learning with expert-written examples. Crowdsourced readers evaluated MCQA accuracy.
Results: Both summary generation methods produced comparable results, with the sequential approach showing slightly fewer inaccuracies. Patients found the summaries easy to understand and helpful for learning about trials. MCQAs demonstrated high accuracy and agreement with crowdsourced readers.
Conclusion: LLMs can effectively generate patient-friendly educational content from ICFs. This has implications for improving patient understanding and engagement in cancer clinical trials. While the findings highlight the potential of LLMs to create scalable educational resources, they also emphasize the need for ongoing human oversight to ensure accuracy and address identified error modes. This research provides a proof-of-concept for leveraging LLMs to enhance clinical trial education, potentially leading to increased recruitment and more successful trials.
Speaker(s):
Danielle Bitterman, MD
Harvard Medical School
Author(s):
Mingye Gao, MS - MIT; Aman Varshney, MS - Technical University of Munich; Shan Chen, M.S - Havard-MGB; Vikram Goddla; Jack Galligant, MBBS - Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA; Patrick Doyle, BA - Department of Radiation Oncology, Brigham and Women’s Hospital/Dana-Farber Cancer Institute, Boston, MA, USA; Claire Novack, BA - Department of Radiation Oncology, Brigham and Women’s Hospital/Dana-Farber Cancer Institute, Boston, MA, USA; Teresia Perkins, BS - Department of Radiation Oncology, Brigham and Women’s Hospital/Dana-Farber Cancer Institute, Boston, MA, USA; Xinrong Correia, BS - Centaur Labs; Erik Duhaime, PhD - Centaur Labs; Howard Isenstein, MA - Digidence; David Kozono, MD PhD - Department of Radiation Oncology, Brigham and Women’s Hospital/Dana-Farber Cancer Institute, Boston, MA, USA; Elad Sharon, MD - Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA; Lisa Lehmann, MD, PhD, MSc; Brian Anthony, PhD - MIT; Dmitriy Dligach, Ph.D. - Loyola University Chicago; Danielle Bitterman, MD - Harvard Medical School;
Presentation Time: 04:00 PM - 04:15 PM
Abstract Keywords: Clinical Trials Innovations, Machine Learning, Generative AI, and Predictive Modeling, Health Literacy Issues and Solutions
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Digital Health Technologies for Patient Research
Introduction: This study investigated the use of Large Language Models (LLMs) to generate patient-friendly educational materials for clinical trials. Informed consent forms (ICFs) from cancer clinical trials were used as input for GPT-4, which was prompted to create both concise summaries and multiple-choice question-answer pairs (MCQAs).
Methods: Two methods were used for summary generation: direct summarization and sequential extraction/summarization. Clinicians evaluated summary quality, while patients assessed readability and usefulness. For MCQAs, GPT-4 was prompted using in-context learning with expert-written examples. Crowdsourced readers evaluated MCQA accuracy.
Results: Both summary generation methods produced comparable results, with the sequential approach showing slightly fewer inaccuracies. Patients found the summaries easy to understand and helpful for learning about trials. MCQAs demonstrated high accuracy and agreement with crowdsourced readers.
Conclusion: LLMs can effectively generate patient-friendly educational content from ICFs. This has implications for improving patient understanding and engagement in cancer clinical trials. While the findings highlight the potential of LLMs to create scalable educational resources, they also emphasize the need for ongoing human oversight to ensure accuracy and address identified error modes. This research provides a proof-of-concept for leveraging LLMs to enhance clinical trial education, potentially leading to increased recruitment and more successful trials.
Speaker(s):
Danielle Bitterman, MD
Harvard Medical School
Author(s):
Mingye Gao, MS - MIT; Aman Varshney, MS - Technical University of Munich; Shan Chen, M.S - Havard-MGB; Vikram Goddla; Jack Galligant, MBBS - Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA; Patrick Doyle, BA - Department of Radiation Oncology, Brigham and Women’s Hospital/Dana-Farber Cancer Institute, Boston, MA, USA; Claire Novack, BA - Department of Radiation Oncology, Brigham and Women’s Hospital/Dana-Farber Cancer Institute, Boston, MA, USA; Teresia Perkins, BS - Department of Radiation Oncology, Brigham and Women’s Hospital/Dana-Farber Cancer Institute, Boston, MA, USA; Xinrong Correia, BS - Centaur Labs; Erik Duhaime, PhD - Centaur Labs; Howard Isenstein, MA - Digidence; David Kozono, MD PhD - Department of Radiation Oncology, Brigham and Women’s Hospital/Dana-Farber Cancer Institute, Boston, MA, USA; Elad Sharon, MD - Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA; Lisa Lehmann, MD, PhD, MSc; Brian Anthony, PhD - MIT; Dmitriy Dligach, Ph.D. - Loyola University Chicago; Danielle Bitterman, MD - Harvard Medical School;
Demonstrating the Value of DeepPhe for Translational studies in Breast/Ovarian Cancer and Melanoma
Presentation Time: 04:15 PM - 04:30 PM
Abstract Keywords: Natural Language Processing, Data-Driven Research and Discovery, Clinical and Research Data Collection, Curation, Preservation, or Sharing
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Emerging Best Practices for Clinical Research Informatics Operations
DeepPhe is an open-source natural language processing (NLP) and visual analytics pipeline for the extraction of detailed cancer information from the Electronic Medical Records (EMR). We hypothesize that information extracted by DeepPhe will facilitate addressing questions that cannot be addressed through structured data alone. To evaluate this hypothesis, we used DeepPhe to address meaningful clinical investigations with data streams, focusing on questions relating to patient genomics and treatments.
Speaker(s):
Harry Hochheiser, PhD
University of Pittsburgh Department of Biomedical Informatics
Author(s):
Alex VanHelene, BS - University of Pittsburgh; Jiarui Yao, Phd - Bosthon Children's Hospital/Harvard Medical School; Eli Goldner, MS - Boston Children's Hospital; Sean Finan, B.S. - Boston Children's Hospital; John Levander, BS - University of Pittsburgh; Dennis Johns, BS - Boston Children's Hospital; David Harris, bbs - Boston Children's Hospital; Piet de Groen, MD - University of Minnesota; Elizabeth Buchbinder, MD - Harvard Medical School and Dana-Farber Cancer Institute; Danielle Bitterman, MD - Harvard Medical School; Jeremy Warner, MD, MS - Brown University; Guergana Savova, PhD - Boston Children's Hospital and Harvard Medical School;
Presentation Time: 04:15 PM - 04:30 PM
Abstract Keywords: Natural Language Processing, Data-Driven Research and Discovery, Clinical and Research Data Collection, Curation, Preservation, or Sharing
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Emerging Best Practices for Clinical Research Informatics Operations
DeepPhe is an open-source natural language processing (NLP) and visual analytics pipeline for the extraction of detailed cancer information from the Electronic Medical Records (EMR). We hypothesize that information extracted by DeepPhe will facilitate addressing questions that cannot be addressed through structured data alone. To evaluate this hypothesis, we used DeepPhe to address meaningful clinical investigations with data streams, focusing on questions relating to patient genomics and treatments.
Speaker(s):
Harry Hochheiser, PhD
University of Pittsburgh Department of Biomedical Informatics
Author(s):
Alex VanHelene, BS - University of Pittsburgh; Jiarui Yao, Phd - Bosthon Children's Hospital/Harvard Medical School; Eli Goldner, MS - Boston Children's Hospital; Sean Finan, B.S. - Boston Children's Hospital; John Levander, BS - University of Pittsburgh; Dennis Johns, BS - Boston Children's Hospital; David Harris, bbs - Boston Children's Hospital; Piet de Groen, MD - University of Minnesota; Elizabeth Buchbinder, MD - Harvard Medical School and Dana-Farber Cancer Institute; Danielle Bitterman, MD - Harvard Medical School; Jeremy Warner, MD, MS - Brown University; Guergana Savova, PhD - Boston Children's Hospital and Harvard Medical School;
Enhancing Validation of Case-Control Omics Signatures through "Minimalist" Single- Subject Analysis (N-of-1 Trials): implication for Rare Disease
Presentation Time: 04:30 PM - 04:45 PM
Abstract Keywords: Clinical Trials Innovations, Biomarker Discovery and Development, Transcriptomics, Clinical Genomics/Omics and Interventions Based on Omics Data, Learning Healthcare System, Genomics/Omic Data Interpretation, Patient-centered Research and Care
Primary Track: Translation Bioinformatics/Precision Medicine
Programmatic Theme: Implementation Science and Deployment in Informatics: Enabling Clinical and Translational Research
INTRODUCTION: Validating transcriptomic signatures from small case-control cohorts poses significant challenges, particularly in rare diseases with limited patient accrual. Single-subject studies (S3) offer a compelling alternative by analyzing paired transcriptomes from the same individual under different conditions (e.g., “ill” vs. “recovered”). Unlike general linear models (GLMs), S3 designs increase statistical power by reducing the number of tested features, relaxing within-cohort concordance requirements, and leveraging isogenic conditions.
METHODS: In this proof-of-concept study, we hypothesized that S3 could validate a case-control-derived sepsis gene signature (SGS) of 185 differentially expressed genes (DEGs; FDR < 5%) identified in a micro-cohort (sepsis: n=6; healthy: n=6). The SGS was tested in a single-subject design using conditions similar to the case-control study (n=1; sepsis vs. recovery).
RESULTS: Each S3 analysis (n=18 individuals) successfully reproduced the SGS, as assessed by the published N-of-1-MixEnrich method (p < 5%). Confounders such as age, gender, and septic shock did not affect the results. Conversely, conventional paired analyses (e.g., GLM) require larger sample sizes (n ≥ 6) to achieve consistent outcomes due to the need to calculate variables’ dispersion.
CONCLUSION: These findings demonstrate that a single subject (N-of-One) study design can validate gene sets identified in conventional case-control studies, significantly reducing cohort size demands for rare disease research or sub-stratified common disorders (e.g., addressing EDI). This approach also advances precision "omics" medicine. Additional studies are needed to evaluate scalability and robustness across diverse patient populations.
Speaker(s):
Yves Lussier, MD
The University of Utah
Author(s):
Liam Nelson, B. Computer Science - The University of Utah; Nima Pouladi, MD, PhD - The University of Utah; Rachel Nelson, BS - University of Utah; Madi Shabanian, PhD - University of Utah; Colleen Kenost, EdD - University of Utah; Elizabeth Middletown, MD - University of Utah; Neil Tolley, B. - University of Utah; Robert Campbell, MD - Wash U; Jesse Rowley, PhD - The University of Utah; Matthew Rondina, MD - The University of Utah; Yves Lussier, MD - University of Utah School of Medicine;
Presentation Time: 04:30 PM - 04:45 PM
Abstract Keywords: Clinical Trials Innovations, Biomarker Discovery and Development, Transcriptomics, Clinical Genomics/Omics and Interventions Based on Omics Data, Learning Healthcare System, Genomics/Omic Data Interpretation, Patient-centered Research and Care
Primary Track: Translation Bioinformatics/Precision Medicine
Programmatic Theme: Implementation Science and Deployment in Informatics: Enabling Clinical and Translational Research
INTRODUCTION: Validating transcriptomic signatures from small case-control cohorts poses significant challenges, particularly in rare diseases with limited patient accrual. Single-subject studies (S3) offer a compelling alternative by analyzing paired transcriptomes from the same individual under different conditions (e.g., “ill” vs. “recovered”). Unlike general linear models (GLMs), S3 designs increase statistical power by reducing the number of tested features, relaxing within-cohort concordance requirements, and leveraging isogenic conditions.
METHODS: In this proof-of-concept study, we hypothesized that S3 could validate a case-control-derived sepsis gene signature (SGS) of 185 differentially expressed genes (DEGs; FDR < 5%) identified in a micro-cohort (sepsis: n=6; healthy: n=6). The SGS was tested in a single-subject design using conditions similar to the case-control study (n=1; sepsis vs. recovery).
RESULTS: Each S3 analysis (n=18 individuals) successfully reproduced the SGS, as assessed by the published N-of-1-MixEnrich method (p < 5%). Confounders such as age, gender, and septic shock did not affect the results. Conversely, conventional paired analyses (e.g., GLM) require larger sample sizes (n ≥ 6) to achieve consistent outcomes due to the need to calculate variables’ dispersion.
CONCLUSION: These findings demonstrate that a single subject (N-of-One) study design can validate gene sets identified in conventional case-control studies, significantly reducing cohort size demands for rare disease research or sub-stratified common disorders (e.g., addressing EDI). This approach also advances precision "omics" medicine. Additional studies are needed to evaluate scalability and robustness across diverse patient populations.
Speaker(s):
Yves Lussier, MD
The University of Utah
Author(s):
Liam Nelson, B. Computer Science - The University of Utah; Nima Pouladi, MD, PhD - The University of Utah; Rachel Nelson, BS - University of Utah; Madi Shabanian, PhD - University of Utah; Colleen Kenost, EdD - University of Utah; Elizabeth Middletown, MD - University of Utah; Neil Tolley, B. - University of Utah; Robert Campbell, MD - Wash U; Jesse Rowley, PhD - The University of Utah; Matthew Rondina, MD - The University of Utah; Yves Lussier, MD - University of Utah School of Medicine;
Advancing Cancer Outcome Predictions Using PDigy and Attention-Based Deep Learning on Whole Slide Images
Presentation Time: 04:45 PM - 05:00 PM
Abstract Keywords: Bioimaging Techniques and Applications, Data-Driven Research and Discovery, Clinical Decision Support for Translational/Data Science Interventions
Primary Track: Translation Bioinformatics/Precision Medicine
Programmatic Theme: Translational Bioinformatics Using Multi-Modal Patient Data and AI
Accurate cancer recurrence prediction is essential for guiding personalized treatment strategies and improving patient outcomes. Traditional methods, such as molecular assays, often face challenges related to accessibility, cost, and scalability. This study explores the integration of PDigy (Pathology Digital), a next-generation, AI-ready file format, with Graph Attention Network (GAT) architectures to enhance whole slide image (WSI) analysis for recurrence prediction across various cancer types. PDigy was implemented to streamline WSI preprocessing by segmenting high-resolution images into patches and embedding clinical metadata directly into the file format, reducing computational overhead while ensuring compatibility with advanced AI models. Using a dataset of WSIs from diverse cancer types, the GAT-based deep learning model employed attention mechanisms to prioritize diagnostically relevant regions, such as tumor-adjacent tissues and high-risk stromal areas, which are key indicators of recurrence risk. This combination of PDigy and GAT demonstrated significant improvements in predictive accuracy compared to baseline models, while also optimizing data handling and scalability for large datasets. In breast cancer cases specifically, the approach exceeded the accuracy of traditional methods, underscoring its potential for broader applications in precision oncology. PDigy’s ability to integrate clinical metadata and provide patch-based processing represents a major advancement in WSI workflows, facilitating precise and efficient AI-driven insights. The synergy between PDigy and GAT marks a transformative step in computational pathology, with implications for personalized medicine and improved cancer care delivery.
Speaker(s):
Sean Hacking, MB,BCh
NYU Langone
Author(s):
Presentation Time: 04:45 PM - 05:00 PM
Abstract Keywords: Bioimaging Techniques and Applications, Data-Driven Research and Discovery, Clinical Decision Support for Translational/Data Science Interventions
Primary Track: Translation Bioinformatics/Precision Medicine
Programmatic Theme: Translational Bioinformatics Using Multi-Modal Patient Data and AI
Accurate cancer recurrence prediction is essential for guiding personalized treatment strategies and improving patient outcomes. Traditional methods, such as molecular assays, often face challenges related to accessibility, cost, and scalability. This study explores the integration of PDigy (Pathology Digital), a next-generation, AI-ready file format, with Graph Attention Network (GAT) architectures to enhance whole slide image (WSI) analysis for recurrence prediction across various cancer types. PDigy was implemented to streamline WSI preprocessing by segmenting high-resolution images into patches and embedding clinical metadata directly into the file format, reducing computational overhead while ensuring compatibility with advanced AI models. Using a dataset of WSIs from diverse cancer types, the GAT-based deep learning model employed attention mechanisms to prioritize diagnostically relevant regions, such as tumor-adjacent tissues and high-risk stromal areas, which are key indicators of recurrence risk. This combination of PDigy and GAT demonstrated significant improvements in predictive accuracy compared to baseline models, while also optimizing data handling and scalability for large datasets. In breast cancer cases specifically, the approach exceeded the accuracy of traditional methods, underscoring its potential for broader applications in precision oncology. PDigy’s ability to integrate clinical metadata and provide patch-based processing represents a major advancement in WSI workflows, facilitating precise and efficient AI-driven insights. The synergy between PDigy and GAT marks a transformative step in computational pathology, with implications for personalized medicine and improved cancer care delivery.
Speaker(s):
Sean Hacking, MB,BCh
NYU Langone
Author(s):