🤖 AI Summary
To address the low estimation accuracy of the average treatment effect (ATE) under high-dimensional covariates (where (p gg n^{1/2})) in completely randomized experiments, this paper proposes a higher-order regression adjustment method based on the Neumann series expansion within a design-based finite-population framework. The method requires no parametric modeling assumptions and relies solely on randomization inference, achieving asymptotic improvement over ordinary least squares (OLS) regression adjustment via a (d)-th order Neumann correction. Theoretically, the corrected estimator is asymptotically normal provided (p^{d+3}(log p)^{d+1} = o(n^{d+2}))—a substantially weaker dimensionality constraint than existing conditions such as (p = o(n^{1/2})) or (o(n^{2/3})). This result breaks a fundamental theoretical bottleneck in ATE estimation under high-dimensional settings and extends the applicability of regression adjustment to ultrahigh-dimensional scenarios.
📝 Abstract
We study average treatment effect (ATE) estimation under complete randomization with many covariates in a design-based, finite-population framework. In randomized experiments, regression adjustment can improve precision of estimators using covariates, without requiring a correctly specified outcome model. However, existing design-based analyses establish asymptotic normality only up to $p = o(n^{1/2})$, extendable to $p = o(n^{2/3})$ with a single de-biasing. We introduce a novel theoretical perspective on the asymptotic properties of regression adjustment through a Neumann-series decomposition, yielding a systematic higher-degree corrections and a refined analysis of regression adjustment. Specifically, for ordinary least squares regression adjustment, the Neumann expansion sharpens analysis of the remainder term, relative to the residual difference-in-means. Under mild leverage regularity, we show that the degree-$d$ Neumann-corrected estimator is asymptotically normal whenever $p^{ d+3}(log p)^{ d+1}=o(n^{ d+2})$, strictly enlarging the admissible growth of $p$. The analysis is purely randomization-based and does not impose any parametric outcome models or super-population assumptions.