🤖 AI Summary
This study addresses the challenge of identifying spatially variable genes (SVGs) and controlling the false discovery rate (FDR) in ultra-high-dimensional spatial transcriptomics data. The authors propose a distribution-free screening framework that integrates a model-X knockoff procedure with a distance-based, distribution-agnostic quasi-likelihood ratio statistic (MM-test), leveraging auxiliary spatial information. The method is designed for both two- and three-dimensional multi-slice data and is accompanied by theoretical guarantees, including FDR control, selection consistency, and bounds on clustering error. Extensive evaluation across 34 real and simulated datasets demonstrates its superior performance, notably enabling the resolution of fine anatomical structures—such as the hippocampal pyramidal layer and dentate gyrus—in three-dimensional mouse brain data.
📝 Abstract
Spatial transcriptomics (ST) technologies enable transcriptome-wide gene expression profiling while preserving spatial resolution, offering unprecedented opportunities to uncover complex spatial structures. Due to the ultra-high dimensionality of ST data, identifying spatially variable genes (SVGs) associated with unknown spatial clusters has become a central task in ST data analysis. Here, we develop a distribution-free SVG screening method based on a novel quasi-likelihood ratio statistic, the MM-test, combined with a knockoff procedure to control the false discovery rate (FDR). MM-test leverages auxiliary information, such as spatial distances, about the unknown spatial domains for SVG screening. Notably, in addition to two-dimensional ST datasets, MM-test is well-suited for increasingly common three-dimensional (3D), multi-slice ST datasets. Extensive benchmarking using simulations and 34 real ST datasets demonstrates that MM-test consistently outperforms existing SVG detection methods. In a 3D mouse brain dataset, MM-test accurately delineates fine-scale structures that are challenging for other methods, such as the 3D architecture of the pyramidal layer of the hippocampal cornu ammonis and the dentate gyrus. Theoretical guarantees-including selection consistency, FDR control, and an error bound for post-selection clustering-are also established.