🤖 AI Summary
Spatial transcriptomics (ST) data suffer from high observational noise, and existing methods train models solely on highly variable genes, neglecting lowly expressed yet co-expressed auxiliary genes—thereby limiting the accuracy of target gene expression estimation. To address this, we propose a bi-level optimization framework that integrates auxiliary gene learning: low-expression genes are modeled as auxiliary tasks, and a differentiable top-k selection mechanism—incorporating gene co-expression priors—is designed to jointly optimize primary task performance and auxiliary gene selection. This approach enables adaptive identification and weighted utilization of informative auxiliary genes while preserving computational differentiability. Experiments demonstrate that our method significantly improves target gene expression reconstruction accuracy over conventional multi-task learning, achieving state-of-the-art performance across multiple ST datasets.
📝 Abstract
Spatial transcriptomics (ST) is a novel technology that enables the observation of gene expression at the resolution of individual spots within pathological tissues. ST quantifies the expression of tens of thousands of genes in a tissue section; however, heavy observational noise is often introduced during measurement. In prior studies, to ensure meaningful assessment, both training and evaluation have been restricted to only a small subset of highly variable genes, and genes outside this subset have also been excluded from the training process. However, since there are likely co-expression relationships between genes, low-expression genes may still contribute to the estimation of the evaluation target. In this paper, we propose $Auxiliary Gene Learning$ (AGL) that utilizes the benefit of the ignored genes by reformulating their expression estimation as auxiliary tasks and training them jointly with the primary tasks. To effectively leverage auxiliary genes, we must select a subset of auxiliary genes that positively influence the prediction of the target genes. However, this is a challenging optimization problem due to the vast number of possible combinations. To overcome this challenge, we propose Prior-Knowledge-Based Differentiable Top-$k$ Gene Selection via Bi-level Optimization (DkGSB), a method that ranks genes by leveraging prior knowledge and relaxes the combinatorial selection problem into a differentiable top-$k$ selection problem. The experiments confirm the effectiveness of incorporating auxiliary genes and show that the proposed method outperforms conventional auxiliary task learning approaches.