Active Measuring in Reinforcement Learning With Delayed Negative Effects

📅 2025-10-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the problem of costly and delayed state measurements in reinforcement learning—e.g., user burden in digital health—by proposing the Active Observable Markov Decision Process (AOMDP) framework, which for the first time models measurement actions as controllable decisions with delay-induced negative utility. Methodologically, we reformulate AOMDP as a periodic partially observable MDP and design an online RL algorithm based on belief-state optimization, employing sequential Monte Carlo to jointly infer environmental parameters and posterior distributions over unobserved latent states. Theoretically and empirically, we show that moderate measurement frequency improves sample efficiency and optimal policy value. In digital health tasks, the agent autonomously balances timing of questionnaire assessments with intervention decisions, significantly enhancing long-term health outcomes while reducing measurement burden. Our core contributions are: (i) establishing a principled active observation modeling paradigm under delay-embedded negative effects, and (ii) providing an efficient, belief-based solution mechanism grounded in sequential Bayesian inference.

Technology Category

Application Category

📝 Abstract
Measuring states in reinforcement learning (RL) can be costly in real-world settings and may negatively influence future outcomes. We introduce the Actively Observable Markov Decision Process (AOMDP), where an agent not only selects control actions but also decides whether to measure the latent state. The measurement action reveals the true latent state but may have a negative delayed effect on the environment. We show that this reduced uncertainty may provably improve sample efficiency and increase the value of the optimal policy despite these costs. We formulate an AOMDP as a periodic partially observable MDP and propose an online RL algorithm based on belief states. To approximate the belief states, we further propose a sequential Monte Carlo method to jointly approximate the posterior of unknown static environment parameters and unobserved latent states. We evaluate the proposed algorithm in a digital health application, where the agent decides when to deliver digital interventions and when to assess users' health status through surveys.
Problem

Research questions and friction points this paper is trying to address.

Optimizing measurement decisions in RL with delayed negative costs
Reducing uncertainty while managing costly state observations
Jointly estimating latent states and environment parameters efficiently
Innovation

Methods, ideas, or system contributions that make the work stand out.

Agent chooses control actions and measurement timing
Formulates problem as periodic partially observable MDP
Uses sequential Monte Carlo for belief state approximation
🔎 Similar Papers
No similar papers found.
D
Daiqi Gao
Harvard University
Ziping Xu
Ziping Xu
Postdoc Fellow at Harvard University
Statistical Reinforcement learningMachine Learning TheoryMobile Health
A
Aseel Rawashdeh
Harvard University
P
Predrag Klasnja
University of Michigan
S
Susan A. Murphy
Harvard University