🤖 AI Summary
This paper identifies a fundamental accuracy limitation of mainstream “honest” causal tree estimators for heterogeneous treatment effect (HTE) estimation—specifically, their inability to achieve polynomial convergence rates. Method: Integrating the CART framework, causal inference theory, and nonparametric lower-bound analysis, the authors derive the first nontrivial minimax lower bound on estimation error for honest causal trees. Contribution/Results: They rigorously prove that honest estimators converge at most at a logarithmic rate—not polynomially—and cannot adaptively improve even under high-dimensional sparsity. The analysis systematically characterizes theoretical performance boundaries across multiple causal tree variants and validates finite-sample inconsistency and performance degradation via simulations. These findings challenge the widely held belief that honesty guarantees favorable asymptotic convergence, establishing a new theoretical benchmark for evaluating and designing HTE estimators.
📝 Abstract
Recursive decision trees have emerged as a leading methodology for heterogeneous causal treatment effect estimation and inference in experimental and observational settings. These procedures are fitted using the celebrated CART (Classification And Regression Tree) algorithm [Breiman et al., 1984], or custom variants thereof, and hence are believed to be "adaptive" to high-dimensional data, sparsity, or other specific features of the underlying data generating process. Athey and Imbens [2016] proposed several "honest" causal decision tree estimators, which have become the standard in both academia and industry. We study their estimators, and variants thereof, and establish lower bounds on their estimation error. We demonstrate that these popular heterogeneous treatment effect estimators cannot achieve a polynomial-in-$n$ convergence rate under basic conditions, where $n$ denotes the sample size. Contrary to common belief, honesty does not resolve these limitations and at best delivers negligible logarithmic improvements in sample size or dimension. As a result, these commonly used estimators can exhibit poor performance in practice, and even be inconsistent in some settings. Our theoretical insights are empirically validated through simulations.