Estimating Dynamic Marginal Policy Effects under Sequential Unconfoundedness

📅 2026-04-07

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of accurately estimating the long-term effects of infinitesimal policy changes in dynamic systems where the full state is unobserved and non-Markovian dependencies are present. Under a sequential unconfoundedness assumption, the authors propose an efficient approach to identify and estimate dynamic marginal policy effects (MPE) by deriving a reduced-form representation that avoids reliance on full state observability or the Markov property. They develop a doubly robust estimator that circumvents the exponential complexity typically associated with long time horizons. To the best of our knowledge, this is the first method to enable efficient and robust off-policy evaluation under general sequential unconfoundedness conditions. The effectiveness and robustness of the proposed approach are demonstrated through simulation studies and an application to dynamic pricing.

📝 Abstract

We develop methods for estimating how infinitesimal policy changes affect long-term outcomes in dynamic systems. We show that dynamic marginal policy effects (MPEs) can be identified via tractable reduced-form expressions, and can be estimated under a general sequential unconfoundedness assumption. We also propose a doubly robust estimator for dynamic MPEs. Our approach does not require observing full dynamic state information (as is typically assumed for off-policy evaluation in Markov decision processes), and does not incur an exponential curse of horizon (as is typical in non-Markovian off-policy evaluation). We demonstrate practicality and robustness of our approach in a number of simulations, including one motivated by a dynamic pricing application where people use past prices to form a reference level for current prices.

Problem

Research questions and friction points this paper is trying to address.

Dynamic Marginal Policy Effects

Sequential Unconfoundedness

Off-policy Evaluation

Dynamic Systems

Long-term Outcomes

Innovation

Methods, ideas, or system contributions that make the work stand out.

dynamic marginal policy effects

sequential unconfoundedness

doubly robust estimation

off-policy evaluation

curse of horizon

🔎 Similar Papers

RoME: A Robust Mixed-Effects Bandit Algorithm for Optimizing Mobile Health Interventions

2023-12-11Citations: 0

Authors to Follow