Predicting Gene Relations with a Graph Transformer Network Integrating DNA, Protein, and Descriptive Data
Presentation Time: 02:15 PM - 02:30 PM
Abstract Keywords: Machine Learning, Computational Biology, Precision Medicine, Large Language Models (LLMs), Systems Biology
Primary Track: Foundations
Predicting gene relations within complex pathways is crucial for advancing precision medicine in cancer treatment. This study introduces an innovative approach for gene relation prediction by integrating DNA sequence, protein sequence, and descriptive data through a graph transformer network. Our method leverages the Kyoto Encyclopedia of Genes and Genomes (KEGG) database for training, validation, and testing, and employs a multilayer perceptron (MLP) for the classification of gene relations. Utilizing DNA-BERT for DNA sequences, ESM2 for protein sequences, and Bio-BERT for gene descriptions, our model generates comprehensive gene embeddings. These embeddings are then processed through a graph transformer network to predict unseen gene relations.
Our findings demonstrate that the integration of DNA, protein, and descriptive embeddings significantly enhances the prediction accuracy for gene relation prediction. Ablation studies further reveal the individual contributions of each data type to the model's performance, highlighting the importance of protein and description embeddings over DNA sequences, which showed marginal improvement when omitted. This suggests a potential reevaluation of DNA data's utility due to its inherent noise and redundancy.
The study underscores the value of a comprehensive, integrative approach in gene relation prediction. While our method shows robust performance, future work will explore refining the model through alternative encoding methods and incorporating edge features to improve accuracy and applicability in precision medicine.
Speaker(s):
Yibo Chen, B. S.
University of Missouri-columbia
Presentation Time: 02:15 PM - 02:30 PM
Abstract Keywords: Machine Learning, Computational Biology, Precision Medicine, Large Language Models (LLMs), Systems Biology
Primary Track: Foundations
Predicting gene relations within complex pathways is crucial for advancing precision medicine in cancer treatment. This study introduces an innovative approach for gene relation prediction by integrating DNA sequence, protein sequence, and descriptive data through a graph transformer network. Our method leverages the Kyoto Encyclopedia of Genes and Genomes (KEGG) database for training, validation, and testing, and employs a multilayer perceptron (MLP) for the classification of gene relations. Utilizing DNA-BERT for DNA sequences, ESM2 for protein sequences, and Bio-BERT for gene descriptions, our model generates comprehensive gene embeddings. These embeddings are then processed through a graph transformer network to predict unseen gene relations.
Our findings demonstrate that the integration of DNA, protein, and descriptive embeddings significantly enhances the prediction accuracy for gene relation prediction. Ablation studies further reveal the individual contributions of each data type to the model's performance, highlighting the importance of protein and description embeddings over DNA sequences, which showed marginal improvement when omitted. This suggests a potential reevaluation of DNA data's utility due to its inherent noise and redundancy.
The study underscores the value of a comprehensive, integrative approach in gene relation prediction. While our method shows robust performance, future work will explore refining the model through alternative encoding methods and incorporating edge features to improve accuracy and applicability in precision medicine.
Speaker(s):
Yibo Chen, B. S.
University of Missouri-columbia
Predicting Gene Relations with a Graph Transformer Network Integrating DNA, Protein, and Descriptive Data
Category
Podium Abstract
Description
Date: Monday (11/11)
Time: 02:15 PM to 02:30 PM
Room: Continental Ballroom 1-2
Time: 02:15 PM to 02:30 PM
Room: Continental Ballroom 1-2