🤖 AI Summary
RNA-seq data are high-dimensional, sparse, and characterized by complex inter-gene dependencies, hindering reliable identification of early-stage cancer biomarkers. To address this, we propose RGE-GCN—a novel framework that integrates graph convolutional networks (GCNs) modeling gene co-expression relationships with interpretable recursive gene selection. Guided by integrated gradients, RGE-GCN iteratively prunes non-discriminative genes while jointly optimizing classification performance and feature selection. The method yields compact, biologically interpretable biomarker sets with clear mechanistic relevance. Evaluated on synthetic data and real-world cohorts of lung, kidney, and cervical cancers, RGE-GCN significantly outperforms mainstream methods—including DESeq2—in accuracy and F1-score. Moreover, selected genes show significant enrichment in canonical oncogenic pathways (e.g., PI3K-AKT), validating biological plausibility and mechanistic coherence.
📝 Abstract
Early detection of cancer plays a key role in improving survival rates, but identifying reliable biomarkers from RNA-seq data is still a major challenge. The data are high-dimensional, and conventional statistical methods often fail to capture the complex relationships between genes. In this study, we introduce RGE-GCN (Recursive Gene Elimination with Graph Convolutional Networks), a framework that combines feature selection and classification in a single pipeline. Our approach builds a graph from gene expression profiles, uses a Graph Convolutional Network to classify cancer versus normal samples, and applies Integrated Gradients to highlight the most informative genes. By recursively removing less relevant genes, the model converges to a compact set of biomarkers that are both interpretable and predictive. We evaluated RGE-GCN on synthetic data as well as real-world RNA-seq cohorts of lung, kidney, and cervical cancers. Across all datasets, the method consistently achieved higher accuracy and F1-scores than standard tools such as DESeq2, edgeR, and limma-voom. Importantly, the selected genes aligned with well-known cancer pathways including PI3K-AKT, MAPK, SUMOylation, and immune regulation. These results suggest that RGE-GCN shows promise as a generalizable approach for RNA-seq based early cancer detection and biomarker discovery (https://rce-gcn.streamlit.app/ ).