Value Gradients with Action Adaptive Search Trees in Continuous (PO)MDPs

📅 2025-03-15

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

Solving planning in continuous-state, continuous-action, and continuous-observation POMDPs—critical for robotics and autonomous systems yet intractable for conventional sampling-based methods—remains a fundamental challenge. This paper proposes Adaptive Gradient Monte Carlo Tree Search (AGMCTS), the first MCTS variant integrating Multiple Importance Sampling (MIS) into the search tree to enable gradient-driven action optimization. We introduce a novel online Monte Carlo value gradient estimator grounded in transition likelihoods and tightly coupled with particle filtering for consistent belief propagation. Furthermore, we propose an action-adaptive search mechanism that dynamically allocates computational effort based on action-space geometry and uncertainty. Experiments demonstrate that AGMCTS converges significantly faster than pure sampling baselines, achieves higher long-horizon expected returns, and scales effectively to high-dimensional continuous domains. These results empirically validate the efficacy, differentiability, and scalability of gradient-guided search in continuous POMDPs.

Technology Category

Application Category

📝 Abstract

Solving Partially Observable Markov Decision Processes (POMDPs) in continuous state, action and observation spaces is key for autonomous planning in many real-world mobility and robotics applications. Current approaches are mostly sample based, and cannot hope to reach near-optimal solutions in reasonable time. We propose two complementary theoretical contributions. First, we formulate a novel Multiple Importance Sampling (MIS) tree for value estimation, that allows to share value information between sibling action branches. The novel MIS tree supports action updates during search time, such as gradient-based updates. Second, we propose a novel methodology to compute value gradients with online sampling based on transition likelihoods. It is applicable to MDPs, and we extend it to POMDPs via particle beliefs with the application of the propagated belief trick. The gradient estimator is computed in practice using the MIS tree with efficient Monte Carlo sampling. These two parts are combined into a new planning algorithm Action Gradient Monte Carlo Tree Search (AGMCTS). We demonstrate in a simulated environment its applicability, advantages over continuous online POMDP solvers that rely solely on sampling, and we discuss further implications.

Problem

Research questions and friction points this paper is trying to address.

Solving POMDPs in continuous spaces for autonomous planning

Improving value estimation with Multiple Importance Sampling trees

Developing a gradient-based planning algorithm for continuous POMDPs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multiple Importance Sampling tree for value estimation

Value gradients computation with online sampling

Action Gradient Monte Carlo Tree Search algorithm

🔎 Similar Papers

Compatible Gradient Approximations for Actor-Critic Algorithms