Leveraging Multi-Source Data to Resolve Inconsistency Across Pharmacogenomic Datasets in Drug Sensitivity Prediction
Presentation Time: 08:24 AM - 08:36 AM
Abstract Keywords: Bioinformatics, Machine Learning, Artificial Intelligence
Primary Track: Applications
Programmatic Theme: Clinical Research Informatics
Researchers have developed pharmacogenomics datasets for various purposes, such as biomarker identification, yet drug response prediction models often underperform due to dataset inconsistencies. These variations arise from inter-tumoral heterogeneity, experimental conditions, and cell subtype complexity, limiting model generalizability. To address this, we propose a computational model based on Aggregated Learning (AL) to enhance drug response prediction by learning from inconsistencies across multiple datasets. Our model minimizes discrepancies by training on overlapping inconsistent data points from three pharmacogenomic datasets—CCLE, GDSC2, and gCSI. Compared to four baseline methods—Selecting Better (SB), Result Average (RA), Combining Data (CD), and Model Average (MA)—our approach achieved superior performance with lower Mean Absolute Error (MAE) scores: 0.090 (CCLE-GDSC), 0.096 (CCLE-gCSI), and 0.081 (GDSC-gCSI). These results demonstrate that addressing inconsistencies enhances prediction accuracy and generalizability, making our model a promising solution for robust drug response predictions.
Speaker(s):
Xiaodi Li, Ph.D.
Mayo Clinic
Author(s):
Xiaodi Li, Ph.D. - Mayo Clinic; Trisha Das, Ph.D. Student - University of Illinois Urbana-Champaign; Kritib Bhattarai, BS - Luther College; Sivaraman Rajaganapathy, Research Fellow/Ph.D. - Mayo Clinic; Vincent Buchner, BS - Luther College; Yanshan Wang, PhD - University of Pittsburgh; Chang Su, PhD - Weill Cornell Medicine; Lichao Sun, Ph.D. - Lehigh University; Liewei Wang, M.D., Ph.D. - Mayo Clinic; James Cerhan, M.D., Ph.D. - Mayo Clinic; Nansu Zong, Ph.D. - Mayo Clinic;
Presentation Time: 08:24 AM - 08:36 AM
Abstract Keywords: Bioinformatics, Machine Learning, Artificial Intelligence
Primary Track: Applications
Programmatic Theme: Clinical Research Informatics
Researchers have developed pharmacogenomics datasets for various purposes, such as biomarker identification, yet drug response prediction models often underperform due to dataset inconsistencies. These variations arise from inter-tumoral heterogeneity, experimental conditions, and cell subtype complexity, limiting model generalizability. To address this, we propose a computational model based on Aggregated Learning (AL) to enhance drug response prediction by learning from inconsistencies across multiple datasets. Our model minimizes discrepancies by training on overlapping inconsistent data points from three pharmacogenomic datasets—CCLE, GDSC2, and gCSI. Compared to four baseline methods—Selecting Better (SB), Result Average (RA), Combining Data (CD), and Model Average (MA)—our approach achieved superior performance with lower Mean Absolute Error (MAE) scores: 0.090 (CCLE-GDSC), 0.096 (CCLE-gCSI), and 0.081 (GDSC-gCSI). These results demonstrate that addressing inconsistencies enhances prediction accuracy and generalizability, making our model a promising solution for robust drug response predictions.
Speaker(s):
Xiaodi Li, Ph.D.
Mayo Clinic
Author(s):
Xiaodi Li, Ph.D. - Mayo Clinic; Trisha Das, Ph.D. Student - University of Illinois Urbana-Champaign; Kritib Bhattarai, BS - Luther College; Sivaraman Rajaganapathy, Research Fellow/Ph.D. - Mayo Clinic; Vincent Buchner, BS - Luther College; Yanshan Wang, PhD - University of Pittsburgh; Chang Su, PhD - Weill Cornell Medicine; Lichao Sun, Ph.D. - Lehigh University; Liewei Wang, M.D., Ph.D. - Mayo Clinic; James Cerhan, M.D., Ph.D. - Mayo Clinic; Nansu Zong, Ph.D. - Mayo Clinic;
Leveraging Multi-Source Data to Resolve Inconsistency Across Pharmacogenomic Datasets in Drug Sensitivity Prediction
Category
Paper - Regular