Multi-dataset and Transfer Learning Using Gene Expression Knowledge Graphs

📅 2025-03-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of small-sample sizes and cross-dataset heterogeneity in gene expression data integration, this paper constructs a gene expression knowledge graph that unifies multi-omics data and biomedical prior knowledge, enabling single-dataset learning, multi-dataset joint modeling, and cross-domain transfer diagnosis. We propose the first knowledge-guided, multi-source collaborative learning framework for gene expression data by end-to-end integrating knowledge graph embedding (TransR) with graph neural networks (GNNs). Our method explicitly encodes biological relationships and functional constraints into the learning process, thereby enhancing model interpretability and generalizability. Extensive experiments demonstrate statistically significant improvements in disease diagnostic accuracy across all three learning paradigms—single-dataset, multi-dataset joint, and cross-domain transfer—validating the effectiveness and robust generalization capability of knowledge graph–driven heterogeneous data fusion for small-sample biomedical modeling.

Technology Category

Application Category

📝 Abstract
Gene expression datasets offer insights into gene regulation mechanisms, biochemical pathways, and cellular functions. Additionally, comparing gene expression profiles between disease and control patients can deepen the understanding of disease pathology. Therefore, machine learning has been used to process gene expression data, with patient diagnosis emerging as one of the most popular applications. Although gene expression data can provide valuable insights, challenges arise because the number of patients in expression datasets is usually limited, and the data from different datasets with different gene expressions cannot be easily combined. This work proposes a novel methodology to address these challenges by integrating multiple gene expression datasets and domain-specific knowledge using knowledge graphs, a unique tool for biomedical data integration. Then, vector representations are produced using knowledge graph embedding techniques, which are used as inputs for a graph neural network and a multi-layer perceptron. We evaluate the efficacy of our methodology in three settings: single-dataset learning, multi-dataset learning, and transfer learning. The experimental results show that combining gene expression datasets and domain-specific knowledge improves patient diagnosis in all three settings.
Problem

Research questions and friction points this paper is trying to address.

Integrating multiple gene expression datasets using knowledge graphs
Overcoming limited patient data and incompatible gene expression datasets
Improving patient diagnosis via multi-dataset and transfer learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates multiple gene expression datasets using knowledge graphs
Utilizes knowledge graph embedding for vector representations
Applies graph neural network and multi-layer perceptron
🔎 Similar Papers
No similar papers found.