๐ค AI Summary
Learning continuous treatment policies from observational data faces three key challenges: nonparametric welfare estimation, infinite-dimensional policy spaces, and shape constraints (e.g., monotonicity or convexity). Method: We propose a novel paradigm that approximates the shape-constrained policy space via a sequence of finite-dimensional subspaces; we develop a data-adaptive tuned penalization algorithm, integrating kernel-based welfare estimation, machine learningโbased propensity score modeling, regularized optimization, and constrained function approximation. Contribution/Results: We establish, for the first time, an oracle inequality for welfare regret under continuous treatments. Theoretically, our estimator achieves statistically optimal convergence rates under both known and unknown propensity scores. Empirically, it significantly enhances out-of-sample policy extrapolation robustness and real-world effectiveness.
๐ Abstract
This paper studies policy learning for continuous treatments from observational data. Continuous treatments present more significant challenges than discrete ones because population welfare may need nonparametric estimation, and policy space may be infinite-dimensional and may satisfy shape restrictions. We propose to approximate the policy space with a sequence of finite-dimensional spaces and, for any given policy, obtain the empirical welfare by applying the kernel method. We consider two cases: known and unknown propensity scores. In the latter case, we allow for machine learning of the propensity score and modify the empirical welfare to account for the effect of machine learning. The learned policy maximizes the empirical welfare or the modified empirical welfare over the approximating space. In both cases, we modify the penalty algorithm proposed in cite{mbakop2021model} to data-automate the tuning parameters (i.e., bandwidth and dimension of the approximating space) and establish an oracle inequality for the welfare regret.