Times are displayed in (UTC-07:00) Pacific Time (US & Canada) Change
11/11/2024 |
10:30 AM – 12:00 PM |
Franciscan D
S33: Cancer and Genomics - Ripped Genes
Presentation Type: Oral
Session Chair:
Dokyoon Kim, PhD - Institute for Biomedical Informatics, University of Pennsylvania
Comparative Analysis of Data Generation Techniques for Breast Cancer Research Using Artificial Intelligence
Presentation Time: 10:30 AM - 10:45 AM
Abstract Keywords: Large Language Models (LLMs), Cancer Prevention, Machine Learning, Natural Language Processing, Disease Models, Teaching Innovation, Education and Training
Primary Track: Applications
Programmatic Theme: Public Health Informatics
This study investigates the use of ChatGPT to support clinical teams with limited expertise in generating synthetic data for breast cancer research. It assesses ChatGPT's application, focusing on effective prompting and best practices for creating high-fidelity synthetic data. The research compares the generated synthetic data to the Wisconsin Breast Cancer Dataset through statistical analysis, structural similarity metrics, and machine learning performance. Results indicate that the quality of prompts and generation techniques significantly affects the data's fidelity. The study highlights the critical role of prompt engineering and data synthesis techniques in producing accurate synthetic data for healthcare research, underscoring the need for precise prompts and generation methods to maintain data integrity in sensitive areas like cancer research.
Speaker(s):
Tia Pope, Ph.D. Student
North Carolina A&T
Author(s):
Ahmad Patooghy, Ph.D. - North Carolina A&T State University;
Presentation Time: 10:30 AM - 10:45 AM
Abstract Keywords: Large Language Models (LLMs), Cancer Prevention, Machine Learning, Natural Language Processing, Disease Models, Teaching Innovation, Education and Training
Primary Track: Applications
Programmatic Theme: Public Health Informatics
This study investigates the use of ChatGPT to support clinical teams with limited expertise in generating synthetic data for breast cancer research. It assesses ChatGPT's application, focusing on effective prompting and best practices for creating high-fidelity synthetic data. The research compares the generated synthetic data to the Wisconsin Breast Cancer Dataset through statistical analysis, structural similarity metrics, and machine learning performance. Results indicate that the quality of prompts and generation techniques significantly affects the data's fidelity. The study highlights the critical role of prompt engineering and data synthesis techniques in producing accurate synthetic data for healthcare research, underscoring the need for precise prompts and generation methods to maintain data integrity in sensitive areas like cancer research.
Speaker(s):
Tia Pope, Ph.D. Student
North Carolina A&T
Author(s):
Ahmad Patooghy, Ph.D. - North Carolina A&T State University;
Unlocking Early Cancer Detection: Leveraging Machine Learning in Cell- Free DNA Analysis for Precision Oncology
Presentation Time: 10:45 AM - 11:00 AM
Abstract Keywords: Bioinformatics, Machine Learning, Cancer Genetics, Biomarkers, Cancer Prevention, Deep Learning, Computational Biology, Precision Medicine
Working Group: Genomics and Translational Bioinformatics Working Group
Primary Track: Applications
Programmatic Theme: Translational Bioinformatics
This study introduces a approach to early cancer detection through the analysis of cell-free DNA (cfDNA), utilizing machine learning algorithms to navigate the complexities of low circulating tumor DNA (ctDNA) fractions and genetic heterogeneity. CfDNA, found in bodily fluids and comprising fragments from apoptotic or necrotic cells, offers a non-invasive means to identify cancer signals. With ctDNA—a subset of cfDNA from cancer cells—serving as a biomarker, the potential for detecting cancer at its earliest stages is vastly improved, enhancing treatment effectiveness and patient prognosis. However, the challenges of distinguishing cancer- specific signatures within cfDNA due to low ctDNA levels and the noise of genetic heterogeneity necessitate advanced methods beyond traditional mutation analysis. Leveraging high-throughput sequencing technologies and the precision of machine learning, we aim to surmount these obstacles by identifying nuanced cancer signatures within cfDNA sequencing data. Machine learning's capability to model complex data relationships allows for the differentiation of subtle oncogenic patterns from background noise, thereby increasing the diagnostic accuracy of liquid biopsies. This paper outlines our exploration into employing machine learning for early cancer detection via cfDNA, detailing our method of transforming sequencing data into analyzable formats, enhancing signal detection through a sliding window technique, and predicting true tumor-origin fragments. Our findings underscore the potential of integrating artificial intelligence with liquid biopsy technologies to revolutionize cancer diagnostics, offering new hope for early detection and personalized treatment pathways.
Speaker(s):
Hui Li, Phd
University of Texas Health Science Center at Houston
Author(s):
jinlian wang, PhD - UTHealth; Hongfang Liu, PhD - University of Texas Health Science Center at Houston;
Presentation Time: 10:45 AM - 11:00 AM
Abstract Keywords: Bioinformatics, Machine Learning, Cancer Genetics, Biomarkers, Cancer Prevention, Deep Learning, Computational Biology, Precision Medicine
Working Group: Genomics and Translational Bioinformatics Working Group
Primary Track: Applications
Programmatic Theme: Translational Bioinformatics
This study introduces a approach to early cancer detection through the analysis of cell-free DNA (cfDNA), utilizing machine learning algorithms to navigate the complexities of low circulating tumor DNA (ctDNA) fractions and genetic heterogeneity. CfDNA, found in bodily fluids and comprising fragments from apoptotic or necrotic cells, offers a non-invasive means to identify cancer signals. With ctDNA—a subset of cfDNA from cancer cells—serving as a biomarker, the potential for detecting cancer at its earliest stages is vastly improved, enhancing treatment effectiveness and patient prognosis. However, the challenges of distinguishing cancer- specific signatures within cfDNA due to low ctDNA levels and the noise of genetic heterogeneity necessitate advanced methods beyond traditional mutation analysis. Leveraging high-throughput sequencing technologies and the precision of machine learning, we aim to surmount these obstacles by identifying nuanced cancer signatures within cfDNA sequencing data. Machine learning's capability to model complex data relationships allows for the differentiation of subtle oncogenic patterns from background noise, thereby increasing the diagnostic accuracy of liquid biopsies. This paper outlines our exploration into employing machine learning for early cancer detection via cfDNA, detailing our method of transforming sequencing data into analyzable formats, enhancing signal detection through a sliding window technique, and predicting true tumor-origin fragments. Our findings underscore the potential of integrating artificial intelligence with liquid biopsy technologies to revolutionize cancer diagnostics, offering new hope for early detection and personalized treatment pathways.
Speaker(s):
Hui Li, Phd
University of Texas Health Science Center at Houston
Author(s):
jinlian wang, PhD - UTHealth; Hongfang Liu, PhD - University of Texas Health Science Center at Houston;
A Comprehensive System for Searching and Evaluating Genomic Variant Evidence Using AI and Knowledge Bases to Support Personalized Medicine
Presentation Time: 11:00 AM - 11:15 AM
Abstract Keywords: Bioinformatics, Cancer Genetics, Deep Learning, Precision Medicine, Precision Medicine, Natural Language Processing, Machine Learning, Data Mining
Working Group: Genomics and Translational Bioinformatics Working Group
Primary Track: Applications
Programmatic Theme: Translational Bioinformatics
We introduce an innovative automated system for the search and assessment of genetic variant evidence, meticulously aligned with ACMG guidelines. Leveraging the synergistic power of artificial intelligence (AI), elastic search, and an extensive knowledge base, our system advances the efficiency and accuracy of genetic variant interpretation. Distinct from existing methodologies, it features a pioneering literature filtering mechanism that automates the identification and relevance ranking of scientific articles, significantly reducing the time spending on literature evidence search and optimizing the evidence assessment process. Implemented and rigorously tested by a commercial company hereditary cancer variant curation team, the system demonstrated its effectiveness and scalability by processing over 3 million PMIDs and 1.8 million full-text articles. Throughout the period of active utilization, significant insights were gleaned into the real-world impact and user experience of the system, conclusively affirming its robustness. Our comparative analysis with Mastermind 2.0 highlights the system's enhanced performance in minimizing false positives for various mutation types. The core AI model exhibits exceptional precision, recall, and F1 scores above 0.8, signifying its adeptness in selecting pertinent literature for variant classification. The experience and knowledge acquired from deploying the system in a commercial setting provide a distinctive outlook on its practicality and prospects for future development. The novel integration of AI with traditional genetic variant curation processes heralds a new era in the field, promising significant advancements and broader application prospects.
Speaker(s):
jinlian wang, PhD
UTHealth
Author(s):
Hui Li, Phd - University of Texas Health Science Center at Houston; Hongfang Liu, PhD - University of Texas Health Science Center at Houston;
Presentation Time: 11:00 AM - 11:15 AM
Abstract Keywords: Bioinformatics, Cancer Genetics, Deep Learning, Precision Medicine, Precision Medicine, Natural Language Processing, Machine Learning, Data Mining
Working Group: Genomics and Translational Bioinformatics Working Group
Primary Track: Applications
Programmatic Theme: Translational Bioinformatics
We introduce an innovative automated system for the search and assessment of genetic variant evidence, meticulously aligned with ACMG guidelines. Leveraging the synergistic power of artificial intelligence (AI), elastic search, and an extensive knowledge base, our system advances the efficiency and accuracy of genetic variant interpretation. Distinct from existing methodologies, it features a pioneering literature filtering mechanism that automates the identification and relevance ranking of scientific articles, significantly reducing the time spending on literature evidence search and optimizing the evidence assessment process. Implemented and rigorously tested by a commercial company hereditary cancer variant curation team, the system demonstrated its effectiveness and scalability by processing over 3 million PMIDs and 1.8 million full-text articles. Throughout the period of active utilization, significant insights were gleaned into the real-world impact and user experience of the system, conclusively affirming its robustness. Our comparative analysis with Mastermind 2.0 highlights the system's enhanced performance in minimizing false positives for various mutation types. The core AI model exhibits exceptional precision, recall, and F1 scores above 0.8, signifying its adeptness in selecting pertinent literature for variant classification. The experience and knowledge acquired from deploying the system in a commercial setting provide a distinctive outlook on its practicality and prospects for future development. The novel integration of AI with traditional genetic variant curation processes heralds a new era in the field, promising significant advancements and broader application prospects.
Speaker(s):
jinlian wang, PhD
UTHealth
Author(s):
Hui Li, Phd - University of Texas Health Science Center at Houston; Hongfang Liu, PhD - University of Texas Health Science Center at Houston;
Quality of 1-year mortality predictions from vendor-supplied versus academic model for cancer patients
Presentation Time: 11:15 AM - 11:30 AM
Abstract Keywords: Clinical Decision Support, Machine Learning, Evaluation
Primary Track: Applications
Programmatic Theme: Clinical Informatics
The Epic End of Life Care Index (EOLCI) is deployed but we are not aware of an independent validation. We evaluated its performance for predicting 1-year mortality in patients with metastatic cancer, comparing it against an academic machine learning model. This retrospective analysis included adult outpatients with metastatic cancer from four outpatient sites. Among 1,283 evaluable patients, AUC for 1-year mortality for EOLCI and academic model was 0.73 (95% CI, 0.70-0.76) and 0.82 (95% CI, 0.80-0.85), respectively. Positive predictive value was 38% and 65%, respectively. The EOLCI's discrimination performance was lower than the vendor-stated value (AUC of 0.86) and the academic model. Vendor-supplied machine learning models should be independently validated, particularly in specialized patient populations, to ensure accuracy and reliability.
Speaker(s):
Michael Gensheimer, MD
Stanford University
Author(s):
Michael Gensheimer, MD - Stanford University; Jonathan Lu, MS - Stanford University;
Presentation Time: 11:15 AM - 11:30 AM
Abstract Keywords: Clinical Decision Support, Machine Learning, Evaluation
Primary Track: Applications
Programmatic Theme: Clinical Informatics
The Epic End of Life Care Index (EOLCI) is deployed but we are not aware of an independent validation. We evaluated its performance for predicting 1-year mortality in patients with metastatic cancer, comparing it against an academic machine learning model. This retrospective analysis included adult outpatients with metastatic cancer from four outpatient sites. Among 1,283 evaluable patients, AUC for 1-year mortality for EOLCI and academic model was 0.73 (95% CI, 0.70-0.76) and 0.82 (95% CI, 0.80-0.85), respectively. Positive predictive value was 38% and 65%, respectively. The EOLCI's discrimination performance was lower than the vendor-stated value (AUC of 0.86) and the academic model. Vendor-supplied machine learning models should be independently validated, particularly in specialized patient populations, to ensure accuracy and reliability.
Speaker(s):
Michael Gensheimer, MD
Stanford University
Author(s):
Michael Gensheimer, MD - Stanford University; Jonathan Lu, MS - Stanford University;
User Comprehension and EHR Integration of the RealRisks Decision Aid for Breast Cancer Risk Assessment: A Qualitative Study
Presentation Time: 11:30 AM - 11:45 AM
Abstract Keywords: Bioinformatics, Cancer Prevention, Usability, Clinical Decision Support, Patient Engagement and Preferences, User-centered Design Methods
Primary Track: Applications
RealRisks is a decision aid that integrates patient-generated and electronic health record (EHR) data using Fast Healthcare Interoperability Resources (FHIR). It offers modules to enhance understanding of breast cancer risk and a way for individuals to review and modify their EHR data before it is used in their personal risk assessment. RealRisks intends to encourage high-risk patients to take risk-reducing measures. To better understand how patients understand risk and barriers to action, we conducted in-depth interviews as part of a usability study to assess the clarity and interpretability of RealRisks. Overall, participants demonstrated an improved understanding of breast cancer risk following their use of RealRisks. However, challenges were noted for certain concepts, in particular, lifetime risk, how benign breast disease affects your risk, and the differences between hereditary, sporadic, and familial cancer. The EHR download feature was well-received, but some raised concerns about insurance and privacy/security.
Speaker(s):
Subiksha Umakanth, MS
Columbia University Irving Medical Center
Author(s):
Subiksha Umakanth, MS - Columbia University Irving Medical Center; Anna Vaynrub, BA - Columbia University Medical Center; Harry West, PhD - Columbia University; Jill Diamond, PhD - Sassafras; Alissa Michel, MD - Columbia University; Katherine Crew, MD, MS - Columbia University; Rita Kukafka, DRPH, MA, FACMI - Columbia University;
Presentation Time: 11:30 AM - 11:45 AM
Abstract Keywords: Bioinformatics, Cancer Prevention, Usability, Clinical Decision Support, Patient Engagement and Preferences, User-centered Design Methods
Primary Track: Applications
RealRisks is a decision aid that integrates patient-generated and electronic health record (EHR) data using Fast Healthcare Interoperability Resources (FHIR). It offers modules to enhance understanding of breast cancer risk and a way for individuals to review and modify their EHR data before it is used in their personal risk assessment. RealRisks intends to encourage high-risk patients to take risk-reducing measures. To better understand how patients understand risk and barriers to action, we conducted in-depth interviews as part of a usability study to assess the clarity and interpretability of RealRisks. Overall, participants demonstrated an improved understanding of breast cancer risk following their use of RealRisks. However, challenges were noted for certain concepts, in particular, lifetime risk, how benign breast disease affects your risk, and the differences between hereditary, sporadic, and familial cancer. The EHR download feature was well-received, but some raised concerns about insurance and privacy/security.
Speaker(s):
Subiksha Umakanth, MS
Columbia University Irving Medical Center
Author(s):
Subiksha Umakanth, MS - Columbia University Irving Medical Center; Anna Vaynrub, BA - Columbia University Medical Center; Harry West, PhD - Columbia University; Jill Diamond, PhD - Sassafras; Alissa Michel, MD - Columbia University; Katherine Crew, MD, MS - Columbia University; Rita Kukafka, DRPH, MA, FACMI - Columbia University;
Meta-Learning on Augmented Gene Expression Profiles for Enhanced Lung Cancer Detection
Presentation Time: 11:45 AM - 12:00 PM
Abstract Keywords: Machine Learning, Cancer Genetics, Deep Learning
Working Group: Knowledge Discovery and Data Mining Working Group
Primary Track: Applications
Programmatic Theme: Clinical Informatics
Gene expression profiles obtained through DNA microarray technology have proven successful in providing critical information for cancer detection classifiers. However, the limited number of samples in these datasets poses a challenge to employ complex methodologies such as deep neural networks for sophisticated analysis. To address this ``small data'' dilemma, Meta-Learning has been introduced as a solution to enhance the optimization of machine learning models by utilizing similar datasets, thereby facilitating a quicker adaptation to target datasets without the requirement of sufficient samples. In this study, we present a meta-learning-based approach for predicting lung cancer from gene expression profiles. We apply this framework to well-established deep learning methodologies and employ four distinct datasets for the meta-learning tasks, where one as the target dataset and the rest as source datasets. Our approach is evaluated against both traditional and deep learning methodologies, and the results show the superior performance of meta-learning on augmented source data compared to the baselines trained on single datasets. Moreover, we conduct the comparative analysis between meta-learning and transfer learning methodologies to highlight the efficiency of the proposed approach in addressing the challenges associated with limited sample sizes. Finally, we incorporate the explainability study to illustrate the distinctiveness of decisions made by meta-learning.
Speaker(s):
Zijun Yao, Ph.D.
University of Kansas
Author(s):
Arya Hadizadeh Moghaddam, Doctorate of Philosophy in Computer Science - University of Kansas; Mohsen Nayebi Kerdabadi, BS - The University of Kansas; Cuncong Zhong, PhD - University of Kansas; Zijun Yao, Ph.D. - University of Kansas;
Presentation Time: 11:45 AM - 12:00 PM
Abstract Keywords: Machine Learning, Cancer Genetics, Deep Learning
Working Group: Knowledge Discovery and Data Mining Working Group
Primary Track: Applications
Programmatic Theme: Clinical Informatics
Gene expression profiles obtained through DNA microarray technology have proven successful in providing critical information for cancer detection classifiers. However, the limited number of samples in these datasets poses a challenge to employ complex methodologies such as deep neural networks for sophisticated analysis. To address this ``small data'' dilemma, Meta-Learning has been introduced as a solution to enhance the optimization of machine learning models by utilizing similar datasets, thereby facilitating a quicker adaptation to target datasets without the requirement of sufficient samples. In this study, we present a meta-learning-based approach for predicting lung cancer from gene expression profiles. We apply this framework to well-established deep learning methodologies and employ four distinct datasets for the meta-learning tasks, where one as the target dataset and the rest as source datasets. Our approach is evaluated against both traditional and deep learning methodologies, and the results show the superior performance of meta-learning on augmented source data compared to the baselines trained on single datasets. Moreover, we conduct the comparative analysis between meta-learning and transfer learning methodologies to highlight the efficiency of the proposed approach in addressing the challenges associated with limited sample sizes. Finally, we incorporate the explainability study to illustrate the distinctiveness of decisions made by meta-learning.
Speaker(s):
Zijun Yao, Ph.D.
University of Kansas
Author(s):
Arya Hadizadeh Moghaddam, Doctorate of Philosophy in Computer Science - University of Kansas; Mohsen Nayebi Kerdabadi, BS - The University of Kansas; Cuncong Zhong, PhD - University of Kansas; Zijun Yao, Ph.D. - University of Kansas;