🤖 AI Summary
This paper addresses the challenge of estimating continuous-variable causal effects when selection bias and confounding coexist, rendering the interventional expectation $E[Ymid ext{do}(X)]$ non-identifiable and leading to substantial estimation bias. To resolve this, we establish novel identifiability conditions and propose a two-step regression (TSR) estimation framework. TSR is the first method to jointly model and correct both biases: it leverages proxy variables to adjust for selection bias while simultaneously controlling for confounding. Crucially, TSR incorporates external data and is proven—under mild, general conditions—to be consistent and to achieve lower asymptotic variance than existing estimators. Extensive simulations demonstrate that TSR consistently outperforms state-of-the-art methods across diverse combinations of selection and confounding biases, delivering robust and accurate causal effect recovery.
📝 Abstract
We consider the problem of estimating the expected causal effect $E[Y|do(X)]$ for a target variable $Y$ when treatment $X$ is set by intervention, focusing on continuous random variables. In settings without selection bias or confounding, $E[Y|do(X)] = E[Y|X]$, which can be estimated using standard regression methods. However, regression fails when systematic missingness induced by selection bias, or confounding distorts the data. Boeken et al. [2023] show that when training data is subject to selection, proxy variables unaffected by this process can, under certain constraints, be used to correct for selection bias to estimate $E[Y|X]$, and hence $E[Y|do(X)]$, reliably. When data is additionally affected by confounding, however, this equality is no longer valid. Building on these results, we consider a more general setting and propose a framework that incorporates both selection bias and confounding. Specifically, we derive theoretical conditions ensuring identifiability and recoverability of causal effects under access to external data and proxy variables. We further introduce a two-step regression estimator (TSR), capable of exploiting proxy variables to adjust for selection bias while accounting for confounding. We show that TSR coincides with prior work if confounding is absent, but achieves a lower variance. Extensive simulation studies validate TSR's correctness for scenarios which may include both selection bias and confounding with proxy variables.