🤖 AI Summary
Selecting the optimal heterogeneous treatment effect (HTE) estimator from multiple candidates is a fundamental challenge in causal inference when ground-truth treatment effects are unobserved. This paper proposes the first label-free, two-sample-splitting multiple testing framework for reliable HTE model selection. Our method decouples auxiliary parameter estimation from weight learning, enabling asymptotic family-wise error rate (FWER) control without requiring true treatment effects. It integrates cross-fitting, exponentially weighted test statistics, and bidirectional sample splitting, with theoretical guarantees established via a stability-based central limit theorem. Empirical evaluation on benchmark datasets—including ACIC 2016, IHDP, and Twins—demonstrates substantial reductions in false selection rates, confirming the method’s validity, robustness, and statistical reliability in ground-truth–free settings.
📝 Abstract
We study the problem of selecting the best heterogeneous treatment effect (HTE) estimator from a collection of candidates in settings where the treatment effect is fundamentally unobserved. We cast estimator selection as a multiple testing problem and introduce a ground-truth-free procedure based on a cross-fitted, exponentially weighted test statistic. A key component of our method is a two-way sample splitting scheme that decouples nuisance estimation from weight learning and ensures the stability required for valid inference. Leveraging a stability-based central limit theorem, we establish asymptotic familywise error rate control under mild regularity conditions. Empirically, our procedure provides reliable error control while substantially reducing false selections compared with commonly used methods across ACIC 2016, IHDP, and Twins benchmarks, demonstrating that our method is feasible and powerful even without ground-truth treatment effects.