Data-driven Policy Learning for Continuous Treatments

📅 2024-02-04

📈 Citations: 2

✨ Influential: 0

🤖 AI Summary

Learning continuous treatment policies from observational data faces three key challenges: nonparametric welfare estimation, infinite-dimensional policy spaces, and shape constraints (e.g., monotonicity or convexity). Method: We propose a novel paradigm that approximates the shape-constrained policy space via a sequence of finite-dimensional subspaces; we develop a data-adaptive tuned penalization algorithm, integrating kernel-based welfare estimation, machine learning–based propensity score modeling, regularized optimization, and constrained function approximation. Contribution/Results: We establish, for the first time, an oracle inequality for welfare regret under continuous treatments. Theoretically, our estimator achieves statistically optimal convergence rates under both known and unknown propensity scores. Empirically, it significantly enhances out-of-sample policy extrapolation robustness and real-world effectiveness.

Technology Category

Application Category

📝 Abstract

This paper studies policy learning for continuous treatments from observational data. Continuous treatments present more significant challenges than discrete ones because population welfare may need nonparametric estimation, and policy space may be infinite-dimensional and may satisfy shape restrictions. We propose to approximate the policy space with a sequence of finite-dimensional spaces and, for any given policy, obtain the empirical welfare by applying the kernel method. We consider two cases: known and unknown propensity scores. In the latter case, we allow for machine learning of the propensity score and modify the empirical welfare to account for the effect of machine learning. The learned policy maximizes the empirical welfare or the modified empirical welfare over the approximating space. In both cases, we modify the penalty algorithm proposed in cite{mbakop2021model} to data-automate the tuning parameters (i.e., bandwidth and dimension of the approximating space) and establish an oracle inequality for the welfare regret.

Problem

Research questions and friction points this paper is trying to address.

Learning optimal continuous treatment policies from observational data

Addressing infinite-dimensional policy spaces with shape restrictions

Automating tuning parameters for empirical welfare maximization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Approximates policy space with finite-dimensional sequences

Uses kernel method for empirical welfare estimation

Automates tuning parameters via modified penalty algorithm

🔎 Similar Papers

No similar papers found.

Authors to Follow