Solving General-Utility Markov Decision Processes in the Single-Trial Regime with Online Planning

📅 2025-05-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work studies infinite-horizon discounted Generalized Utility Markov Decision Processes (GUMDPs) under single-trajectory evaluation—i.e., policy optimization using only one observed execution trajectory. First, we establish an equivalent MDP formulation for GUMDPs in the single-trajectory regime and rigorously prove the existence of optimal policies without assuming expected utility; our framework accommodates any monotonic continuous utility function. Second, we propose an online planning framework based on Monte Carlo Tree Search (MCTS), integrating feasibility-aware policy evaluation with utility-sensitive backpropagation. Experiments across diverse generalized utility tasks demonstrate that our method significantly outperforms existing baselines, achieving both theoretical soundness—via formal guarantees on optimality and utility flexibility—and empirical effectiveness in practical settings.

Technology Category

Application Category

📝 Abstract
In this work, we contribute the first approach to solve infinite-horizon discounted general-utility Markov decision processes (GUMDPs) in the single-trial regime, i.e., when the agent's performance is evaluated based on a single trajectory. First, we provide some fundamental results regarding policy optimization in the single-trial regime, investigating which class of policies suffices for optimality, casting our problem as a particular MDP that is equivalent to our original problem, as well as studying the computational hardness of policy optimization in the single-trial regime. Second, we show how we can leverage online planning techniques, in particular a Monte-Carlo tree search algorithm, to solve GUMDPs in the single-trial regime. Third, we provide experimental results showcasing the superior performance of our approach in comparison to relevant baselines.
Problem

Research questions and friction points this paper is trying to address.

Solving infinite-horizon general-utility MDPs in single-trial regime
Investigating policy optimization and computational hardness for single-trial performance
Leveraging online planning techniques like Monte-Carlo tree search
Innovation

Methods, ideas, or system contributions that make the work stand out.

Solving infinite-horizon GUMDPs in single-trial regime
Using Monte-Carlo tree search for online planning
Demonstrating superior performance against baselines
🔎 Similar Papers
No similar papers found.