Data-driven Policy Learning for Continuous Treatments

๐Ÿ“… 2024-02-04
๐Ÿ“ˆ Citations: 2
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Learning continuous treatment policies from observational data faces three key challenges: nonparametric welfare estimation, infinite-dimensional policy spaces, and shape constraints (e.g., monotonicity or convexity). Method: We propose a novel paradigm that approximates the shape-constrained policy space via a sequence of finite-dimensional subspaces; we develop a data-adaptive tuned penalization algorithm, integrating kernel-based welfare estimation, machine learningโ€“based propensity score modeling, regularized optimization, and constrained function approximation. Contribution/Results: We establish, for the first time, an oracle inequality for welfare regret under continuous treatments. Theoretically, our estimator achieves statistically optimal convergence rates under both known and unknown propensity scores. Empirically, it significantly enhances out-of-sample policy extrapolation robustness and real-world effectiveness.

Technology Category

Application Category

๐Ÿ“ Abstract
This paper studies policy learning for continuous treatments from observational data. Continuous treatments present more significant challenges than discrete ones because population welfare may need nonparametric estimation, and policy space may be infinite-dimensional and may satisfy shape restrictions. We propose to approximate the policy space with a sequence of finite-dimensional spaces and, for any given policy, obtain the empirical welfare by applying the kernel method. We consider two cases: known and unknown propensity scores. In the latter case, we allow for machine learning of the propensity score and modify the empirical welfare to account for the effect of machine learning. The learned policy maximizes the empirical welfare or the modified empirical welfare over the approximating space. In both cases, we modify the penalty algorithm proposed in cite{mbakop2021model} to data-automate the tuning parameters (i.e., bandwidth and dimension of the approximating space) and establish an oracle inequality for the welfare regret.
Problem

Research questions and friction points this paper is trying to address.

Learning optimal continuous treatment policies from observational data
Addressing infinite-dimensional policy spaces with shape restrictions
Automating tuning parameters for empirical welfare maximization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Approximates policy space with finite-dimensional sequences
Uses kernel method for empirical welfare estimation
Automates tuning parameters via modified penalty algorithm
๐Ÿ”Ž Similar Papers
No similar papers found.
Chunrong Ai
Chunrong Ai
The Chinese University of Hong Kong, Shenzhen
econometrics
Y
Yue Fang
School of Management and Economics, The Chinese University of Hong Kong, Shenzhen
Haitian Xie
Haitian Xie
Peking University
Economics