Doubly robust and computationally efficient high-dimensional variable selection

📅 2024-09-14

📈 Citations: 1

✨ Influential: 0

🤖 AI Summary

To address the computational inefficiency of conditional independence testing–based variable selection methods (e.g., PCM) in high-dimensional settings, this paper proposes tower PCM (tPCM), a computationally efficient and doubly robust extension of projection covariance. tPCM requires only a single full-model fit and estimation of the prior distribution of predictive variables, reducing computational complexity from O(p) machine learning fits to O(1). It establishes, for the first time, theoretical equivalence between the model-X framework and doubly robust inference, while enabling per-variable p-value output. We prove its double robustness and asymptotic equivalence to the oracle procedure. Simulations demonstrate up to 130× speedup over PCM and HRT with comparable statistical power. In applications where the covariate distribution X is well-modeled—such as genome-wide association studies (GWAS)—tPCM substantially outperforms knockoffs and HRT.

Technology Category

Application Category

📝 Abstract

Variable selection can be performed by testing conditional independence (CI) between each predictor and the response, given the other predictors. A doubly robust and powerful option for these CI tests is the projected covariance measure (PCM) test. However, directly deploying PCM for variable selection brings computational challenges: testing a single variable involves a few machine learning fits, so testing $p$ variables requires $O(p)$ fits. Inspired by model-X ideas, we observe that an estimate of the joint predictor distribution and a single response-on-all-predictors fit can be used to reconstruct all PCM fits. This yields tower PCM (tPCM), a computationally efficient extension of PCM to variable selection. When the joint predictor distribution is sufficiently tractable, as in applications like genome-wide association studies, tPCM offers a substantial speedup over PCM -- up to 130$ imes$ in our simulations -- while matching its power. tPCM also improves on model-X methods like knockoffs and holdout randomization test (HRT) by returning per-variable $p$-values and improving speed, respectively. We prove that tPCM is doubly robust and asymptotically equivalent to both PCM and HRT. We thus extend the bridge between model-X and doubly robust approaches, demonstrating their independent arrival at equivalent methods and showing that this intersection is a fruitful source of new methodologies like tPCM.

Problem

Research questions and friction points this paper is trying to address.

Develops computationally efficient variable selection for high-dimensional data

Addresses scalability of conditional independence testing in large datasets

Bridges model-X methods with doubly robust statistical approaches

Innovation

Methods, ideas, or system contributions that make the work stand out.

Projected covariance measure test for conditional independence

Model-X framework for joint predictor distribution estimation

Tower PCM algorithm enabling O(1) computational complexity

🔎 Similar Papers

No similar papers found.

Authors to Follow