Simultaneous inference for generalized linear models with unmeasured confounders

📅 2023-09-13

🏛️ arXiv.org

📈 Citations: 4

✨ Influential: 1

career value

212K/year

🤖 AI Summary

In genomic studies, unmeasured confounding factors induce bias in large-scale hypothesis testing. To address this, we propose the first unified framework for joint inference in generalized linear models (GLMs) applicable to arbitrary confounding mechanisms. Methodologically, our approach integrates orthogonal structural modeling, a three-stage linear projection scheme, and ℓ₁-regularized joint estimation to simultaneously separate confounding effects, co-learn latent factors and primary effects, and perform projection-weighted bias correction. We theoretically establish that the z-test strictly controls Type I error under high-dimensional asymptotics and derive non-asymptotic error bounds. On single-cell RNA-seq data, our method significantly improves FDR calibration accuracy and statistical power over existing approaches, while retaining nominal significance levels even after Benjamini–Hochberg correction.

📝 Abstract

Tens of thousands of simultaneous hypothesis tests are routinely performed in genomic studies to identify differentially expressed genes. However, due to unmeasured confounders, many standard statistical approaches may be substantially biased. This paper investigates the large-scale hypothesis testing problem for multivariate generalized linear models in the presence of confounding effects. Under arbitrary confounding mechanisms, we propose a unified statistical estimation and inference framework that harnesses orthogonal structures and integrates linear projections into three key stages. It first leverages multivariate responses to separate marginal and uncorrelated confounding effects, recovering the confounding coefficients' column space. Subsequently, latent factors and primary effects are jointly estimated, utilizing $ell_1$-regularization for sparsity while imposing orthogonality onto confounding coefficients. Finally, we incorporate projected and weighted bias-correction steps for hypothesis testing. Theoretically, we establish various effects' identification conditions and non-asymptotic error bounds. We show effective Type-I error control of asymptotic $z$-tests as sample and response sizes approach infinity. Numerical experiments demonstrate that the proposed method controls the false discovery rate by the Benjamini-Hochberg procedure and is more powerful than alternative methods. By comparing single-cell RNA-seq counts from two groups of samples, we demonstrate the suitability of adjusting confounding effects when significant covariates are absent from the model.

Problem

Research questions and friction points this paper is trying to address.

Addresses bias in genomic studies due to unmeasured confounders

Proposes a framework for large-scale hypothesis testing in GLMs

Ensures effective Type-I error control and FDR management

Innovation

Methods, ideas, or system contributions that make the work stand out.

Orthogonal structures disentangle marginal confounding effects.

Lasso-type optimization jointly estimates latent factors.

Projected bias-correction ensures effective hypothesis testing.

🔎 Similar Papers

No similar papers found.