🤖 AI Summary
This paper addresses robust estimation of average partial effects (APEs) in nonlinear models under moderate-dimensional settings. We propose a novel double machine learning framework that dispenses with linearity assumptions and differentiability requirements on the regression model, permitting arbitrary black-box machine learning algorithms as first-stage estimators. Our method innovatively introduces re-smoothing to confer differentiability upon otherwise non-differentiable estimators; integrates a location-scale model to flexibly characterize the conditional distribution of covariates; and constructs a doubly robust semiparametric inference procedure. We establish theoretical guarantees: the estimator achieves the semiparametric efficiency bound and remains robust under model misspecification and other nonstandard conditions. Numerical experiments demonstrate substantial improvements over existing APE estimators in both estimation accuracy and confidence interval coverage.
📝 Abstract
Single-parameter summaries of variable effects are desirable for ease of interpretation, but linear models, which would deliver these, may fit poorly to the data. A modern approach is to estimate the average partial effect -- the average slope of the regression function with respect to the predictor of interest -- using a doubly robust semiparametric procedure. Most existing work has focused on specific forms of nuisance function estimators. We extend the scope to arbitrary plug-in nuisance function estimation, allowing for the use of modern machine learning methods which in particular may deliver non-differentiable regression function estimates. Our procedure involves resmoothing a user-chosen first-stage regression estimator to produce a differentiable version, and modelling the conditional distribution of the predictors through a location-scale model. We show that our proposals lead to a semiparametric efficient estimator under relatively weak assumptions. Our theory makes use of a new result on the sub-Gaussianity of Lipschitz score functions that may be of independent interest. We demonstrate the attractive numerical performance of our approach in a variety of settings including ones with misspecification.