Treatment response as a latent variable

πŸ“… 2025-02-12
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Clinical response exhibits natural variability, impeding accurate discrimination between responders and non-respondersβ€”a key bottleneck in causal-driven analysis of response heterogeneity. To address this, we propose the Causal Two-Group (C2G) model, which formalizes treatment response as a latent variable and introduces two novel empirical Bayes approaches: semi-parametric and non-parametric. Under non-identifiability, we define a new estimand and develop an estimation interval strategy with rigorous theoretical guarantees. Integrating causal inference, latent variable modeling, and false discovery rate (FDR) control, C2G ensures strict FDR control while achieving near-optimal statistical power. Applied to cancer immunotherapy data, C2G successfully identifies clinically validated positive and negative biomarkers. Both theoretical analysis and empirical evaluation demonstrate its robustness and superiority over existing methods.

Technology Category

Application Category

πŸ“ Abstract
Scientists often need to analyze the samples in a study that responded to treatment in order to refine their hypotheses and find potential causal drivers of response. Natural variation in outcomes makes teasing apart responders from non-responders a statistical inference problem. To handle latent responses, we introduce the causal two-groups (C2G) model, a causal extension of the classical two-groups model. The C2G model posits that treated samples may or may not experience an effect, according to some prior probability. We propose two empirical Bayes procedures for the causal two-groups model, one under semi-parametric conditions and another under fully nonparametric conditions. The semi-parametric model assumes additive treatment effects and is identifiable from observed data. The nonparametric model is unidentifiable, but we show it can still be used to test for response in each treated sample. We show empirically and theoretically that both methods for selecting responders control the false discovery rate at the target level with near-optimal power. We also propose two novel estimands of interest and provide a strategy for deriving estimand intervals in the unidentifiable nonparametric model. On a cancer immunotherapy dataset, the nonparametric C2G model recovers clinically-validated predictive biomarkers of both positive and negative outcomes. Code is available at https://github.com/tansey-lab/causal2groups.
Problem

Research questions and friction points this paper is trying to address.

Identifies treatment responders using statistical models
Controls false discovery rate with optimal power
Recovers predictive biomarkers in immunotherapy datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Causal two-groups model
Empirical Bayes procedures
Nonparametric model for testing
πŸ”Ž Similar Papers
No similar papers found.
C
Christopher Tosh
Memorial Sloan Kettering Cancer Center, New York, NY
B
Boyuan Zhang
Stanford University, Palo Alto, CA
Wesley Tansey
Wesley Tansey
Memorial Sloan Kettering Cancer Center
Machine LearningBayesian StatisticsDeep LearningHypothesis TestingComputational Biology