🤖 AI Summary
This paper addresses the challenge of estimating heterogeneous dose–response curves (HDRCs) under long-term continuous treatment, where existing methods rely on strong assumptions—such as no unmeasured confounding and binary treatment—that hinder personalized decision-making. We propose an optimal transport-based weighting framework for data alignment, the first to incorporate optimal transport into long-term causal inference; it mitigates bias from unmeasured confounding via reweighting. We derive a generalization bound for counterfactual prediction under the reweighted distribution and jointly model continuous dosing and individual-level heterogeneous treatment effects. On synthetic and semi-synthetic benchmarks, our HDRC estimator reduces estimation error by over 30% compared to state-of-the-art methods, demonstrating both the tightness of our theoretical bound and the robustness of the estimator.
📝 Abstract
Long-term treatment effect estimation is a significant but challenging problem in many applications. Existing methods rely on ideal assumptions, such as no unobserved confounders or binary treatment, to estimate long-term average treatment effects. However, in numerous real-world applications, these assumptions could be violated, and average treatment effects are insufficient for personalized decision-making. In this paper, we address a more general problem of estimating long-term Heterogeneous Dose-Response Curve (HDRC) while accounting for unobserved confounders and continuous treatment. Specifically, to remove the unobserved confounders in the long-term observational data, we introduce an optimal transport weighting framework to align the long-term observational data to an auxiliary short-term experimental data. Furthermore, to accurately predict the heterogeneous effects of continuous treatment, we establish a generalization bound on counterfactual prediction error by leveraging the reweighted distribution induced by optimal transport. Finally, we develop a long-term HDRC estimator building upon the above theoretical foundations. Extensive experiments on synthetic and semi-synthetic datasets demonstrate the effectiveness of our approach.