Computationally efficient and statistically accurate conditional independence testing with spaCRT

📅 2024-07-12
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
In high-dimensional, large-scale data analysis—such as single-cell CRISPR screens—conditional independence testing faces a fundamental trade-off between statistical accuracy and computational efficiency. To address this, we propose spaCRT, the first framework systematically integrating saddlepoint approximation (SPA) into conditional randomization tests (CRT). We establish theoretical guarantees: spaCRT’s p-value relative error converges to zero, and it is asymptotically equivalent to dCRT while achieving comparable finite-sample performance—the first rigorous theoretical foundation for SPA in CRT. Experiments on synthetic and real single-cell multi-omics datasets demonstrate that spaCRT strictly controls Type-I error, maintains high statistical power, and reduces runtime significantly compared to dCRT, outperforming existing asymptotic and resampling-based methods.

Technology Category

Application Category

📝 Abstract
We introduce the saddlepoint approximation-based conditional randomization test (spaCRT), a novel conditional independence test that effectively balances statistical accuracy and computational efficiency, inspired by applications to single-cell CRISPR screens. Resampling-based methods like the distilled conditional randomization test (dCRT) offer statistical precision but at a high computational cost. The spaCRT leverages a saddlepoint approximation to the resampling distribution of the dCRT test statistic, achieving very similar finite-sample statistical performance with significantly reduced computational demands. We prove that the spaCRT $p$-value approximates the dCRT $p$-value with vanishing relative error, and that these two tests are asymptotically equivalent. Through extensive simulations and real data analysis, we demonstrate that the spaCRT controls Type-I error and maintains high power, outperforming other asymptotic and resampling-based tests. Our method is particularly well-suited for large-scale single-cell CRISPR screen analyses, facilitating the efficient and accurate assessment of perturbation-gene associations.
Problem

Research questions and friction points this paper is trying to address.

Develops accurate saddlepoint approximations for resampling-based large-scale hypothesis testing
Introduces spaCRT for efficient conditional independence testing without resampling
Validates spaCRT with modern regression tools on sparse genomic datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Saddlepoint approximations for resampling-based procedures
Theoretical foundation for conditional tail probabilities
Resampling-free conditional independence test (spaCRT)
🔎 Similar Papers
No similar papers found.