🤖 AI Summary
This paper addresses counterfactual inference under instrumental variable (IV) settings in nonseparable outcome models. Traditional IV methods rely on restrictive additive noise assumptions, limiting their applicability. To overcome this, we propose novel identification conditions based on latent-variable monotonicity and joint Gaussianity—establishing causal identifiability without additive noise assumptions for the first time in nonseparable frameworks. We further develop an estimation framework integrating structural causal models with normalizing flows, jointly modeling latent variables and counterfactual distributions by maximizing the observed-data likelihood. We provide theoretical guarantees of identifiability under our conditions. Empirical evaluation on synthetic and semi-synthetic datasets demonstrates that our method accurately recovers the latent potential outcome function and significantly improves counterfactual prediction accuracy. Our work establishes a new paradigm for nonseparable causal inference, broadening the scope of valid IV-based identification beyond classical parametric constraints.
📝 Abstract
To reach human level intelligence, learning algorithms need to incorporate causal reasoning. But identifying causality, and particularly counterfactual reasoning, remains an elusive task. In this paper, we make progress on this task by utilizing instrumental variables (IVs). IVs are a classic tool for mitigating bias from unobserved confounders when estimating causal effects. While IV methods have been extended to non-separable structural models at the population level, existing approaches to counterfactual prediction typically assume additive noise in the outcome. In this paper, we show that under standard IV assumptions, along with the assumptions that latent noises in treatment and outcome are strictly monotonic and jointly Gaussian, the treatment-outcome relationship becomes uniquely identifiable from observed data. This enables counterfactual inference even in nonseparable models. We implement our approach by training a normalizing flow to maximize the likelihood of the observed data, demonstrating accurate recovery of the underlying outcome function. We call our method Flow IV.