🤖 AI Summary
This study addresses the challenge of causal inference under sample selection bias, where both treatment assignment and outcome observability are non-random. The authors extend the Riesz representation framework to the sample selection model for the first time and propose an automated debiased learning approach based on the ForestRiesz estimator. By reformulating causal effect estimation as a Riesz representer learning problem, the method enhances estimation stability while enabling an interpretable three-component decomposition of omitted-variable bias. Simulation experiments demonstrate that the ForestRiesz estimator achieves greater stability than conventional inverse probability weighting. In an empirical application to gender wage gaps, the approach not only reveals that traditional methods substantially underestimate treatment effects but also exhibits strong robustness to unobserved confounding.
📝 Abstract
In this paper, we extend the Riesz representation framework to causal inference under sample selection, where both treatment assignment and outcome observability are non-random. Formulating the problem in terms of a Riesz representer enables stable estimation and a transparent decomposition of omitted variable bias into three interpretable components: a data-identified scale factor, outcome confounding strength, and selection confounding strength. For estimation, we employ the ForestRiesz estimator, which accounts for selective outcome observability while avoiding the instability associated with direct propensity score inversion. We assess finite-sample performance through a simulation study and show that conventional double machine learning approaches can be highly sensitive to tuning parameters due to their reliance on inverse probability weighting, whereas the ForestRiesz estimator delivers more stable performance by leveraging automatic debiased machine learning. In an empirical application to the gender wage gap in the U.S., we find that our ForestRiesz approach yields larger treatment effect estimates than a standard double machine learning approach, suggesting that ignoring sample selection leads to an underestimation of the gender wage gap. Sensitivity analysis indicates that implausibly strong unobserved confounding would be required to overturn our results. Overall, our approach provides a unified, robust, and computationally attractive framework for causal inference under sample selection.