Semiparametric Efficiency in Policy Learning with General Treatments

📅 2025-12-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses policy learning with general treatment variables—discrete, continuous, or mixed—and establishes the first unified semiparametrically efficient estimation framework. It systematically characterizes the asymptotic efficiency bounds for welfare regret under both deterministic and stochastic policies. Methodologically, it introduces a novel efficiency definition for welfare regret, overcoming the non-differentiability barrier inherent in conventional parametric paths for deterministic policies; identifies a new “Higher-Order Efficiency” (HIR) phenomenon wherein inverse-probability-weighted (IPW) estimators using estimated rather than true propensity scores achieve superior asymptotic efficiency; and derives the asymptotic distributions of mainstream policy estimators via pathwise differentiability analysis, convolution theorem, and IPW. Empirically, the framework significantly improves estimation efficiency in job training and savings program evaluations and reveals mean-shift effects.

Technology Category

Application Category

📝 Abstract
Recent literature on policy learning has primarily focused on regret bounds of the learned policy. We provide a new perspective by developing a unified semiparametric efficiency framework for policy learning, allowing for general treatments that are discrete, continuous, or mixed. We provide a characterization of the failure of pathwise differentiability for parameters arising from deterministic policies. We then establish efficiency bounds for pathwise differentiable parameters in randomized policies, both when the propensity score is known and when it must be estimated. Building on the convolution theorem, we introduce a notion of efficiency for the asymptotic distribution of welfare regret, showing that inefficient policy estimators not only inflate the variance of the asymptotic regret but also shift its mean upward. We derive the asymptotic theory of several common policy estimators, with a key contribution being a policy-learning analogue of the Hirano-Imbens-Ridder (HIR) phenomenon: the inverse propensity weighting estimator with an estimated propensity is efficient, whereas the same estimator using the true propensity is not. We illustrate the theoretical results with an empirically calibrated simulation study based on data from a job training program and an empirical application to a commitment savings program.
Problem

Research questions and friction points this paper is trying to address.

Characterizes semiparametric efficiency for policy learning with general treatments
Establishes efficiency bounds for randomized policies under known and estimated propensities
Analyzes asymptotic distribution of welfare regret and efficiency of policy estimators
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semiparametric efficiency framework for policy learning
Efficiency bounds for randomized policies with known or estimated propensity
Inverse propensity weighting with estimated propensity is efficient
🔎 Similar Papers
No similar papers found.