🤖 AI Summary
The genetic mechanisms underlying Alzheimer’s disease (AD) remain poorly understood, and existing methods struggle to identify strongly causal disease-associated genes.
Method: We propose a novel “neuron-to-gene backtracking” paradigm: leveraging a genomic foundation model, we construct an interpretable neural network that defines probabilistic coupling between Most Causal Neurons (MCNs) and Most Causal Genes (MCGs). Integrating discretized gene token representations with a causality-driven backward activation propagation algorithm, our approach enables targeted溯源 from high-causality neurons to input-layer genes. It jointly optimizes gene embeddings and causal scores while unifying known and putative disease genes in a single modeling framework.
Contribution/Results: Our method identifies multiple novel strong causal AD genes, achieving an AUC of 0.92 on an independent validation set—significantly outperforming GWAS and state-of-the-art deep learning approaches—and demonstrates promising cross-disease generalizability.
📝 Abstract
Alzheimer's Disease (AD) affects over 55 million people globally, yet the key genetic contributors remain poorly understood. Leveraging recent advancements in genomic foundation models, we present the innovative Reverse-Gene-Finder technology, a ground-breaking neuron-to-gene-token backtracking approach in a neural network architecture to elucidate the novel causal genetic biomarkers driving AD onset. Reverse-Gene-Finder comprises three key innovations. Firstly, we exploit the observation that genes with the highest probability of causing AD, defined as the most causal genes (MCGs), must have the highest probability of activating those neurons with the highest probability of causing AD, defined as the most causal neurons (MCNs). Secondly, we utilize a gene token representation at the input layer to allow each gene (known or novel to AD) to be represented as a discrete and unique entity in the input space. Lastly, in contrast to the existing neural network architectures, which track neuron activations from the input layer to the output layer in a feed-forward manner, we develop an innovative backtracking method to track backwards from the MCNs to the input layer, identifying the Most Causal Tokens (MCTs) and the corresponding MCGs. Reverse-Gene-Finder is highly interpretable, generalizable, and adaptable, providing a promising avenue for application in other disease scenarios.