Causal Inference for Genomic Data with Multiple Heterogeneous Outcomes

📅 2024-04-14
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
In single-cell RNA sequencing, individual true gene expression is unobservable and must be estimated via proxy measurements across multiple cells, posing challenges for causal inference with multiple heterogeneous derived outcomes. To address this, we propose the first doubly robust semiparametric causal inference framework tailored to multivariate derived outcomes. We innovatively define the standardized average treatment effect (SATE) and quantile treatment effect (QTE), and develop a high-dimensional multiple testing procedure that controls the false discovery exceedance (FDX). Our method integrates Von Mises expansion, estimating equations, Gaussian multiplier bootstrap, and semiparametric modeling to quantify causal effect heterogeneity. Applied to single-cell CRISPR perturbation data and inter-individual differential expression analysis, it significantly improves causal identification accuracy and statistical reliability, enabling rigorous multi-gene synergistic causal inference.

Technology Category

Application Category

📝 Abstract
With the evolution of single-cell RNA sequencing techniques into a standard approach in genomics, it has become possible to conduct cohort-level causal inferences based on single-cell-level measurements. However, the individual gene expression levels of interest are not directly observable; instead, only repeated proxy measurements from each individual's cells are available, providing a derived outcome to estimate the underlying outcome for each of many genes. In this paper, we propose a generic semiparametric inference framework for doubly robust estimation with multiple derived outcomes, which also encompasses the usual setting of multiple outcomes when the response of each unit is available. To reliably quantify the causal effects of heterogeneous outcomes, we specialize the analysis to standardized average treatment effects and quantile treatment effects. Through this, we demonstrate the use of the semiparametric inferential results for doubly robust estimators derived from both Von Mises expansions and estimating equations. A multiple testing procedure based on Gaussian multiplier bootstrap is tailored for doubly robust estimators to control the false discovery exceedance rate. Applications in single-cell CRISPR perturbation analysis and individual-level differential expression analysis demonstrate the utility of the proposed methods and offer insights into the usage of different estimands for causal inference in genomics.
Problem

Research questions and friction points this paper is trying to address.

Estimating causal effects from noisy single-cell genomic data
Handling multiple heterogeneous outcomes in causal inference
Controlling false discovery rates in genomic multiple testing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semiparametric framework for doubly robust estimation
Standardized and quantile treatment effects specialization
Gaussian multiplier bootstrap for false discovery control
🔎 Similar Papers
No similar papers found.