Entropic Risk-Aware Monte Carlo Tree Search

πŸ“… 2026-01-25
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the planning problem in risk-sensitive Markov decision processes (MDPs) with entropy-regularized risk measures (ERM) as the optimization objective. The authors propose a Monte Carlo tree search (MCTS) algorithm that integrates upper confidence bound (UCB) strategies with dynamic programming. Notably, this is the first work to establish non-asymptotic correctness guarantees and polynomial regret bounds for risk-sensitive MCTS, rigorously proving that the empirical ERM at the root node converges to the optimal value. Both theoretical analysis and empirical evaluations demonstrate that the proposed algorithm outperforms existing baselines in risk-sensitive decision-making tasks, achieving a strong balance between theoretical rigor and practical effectiveness.

Technology Category

Application Category

πŸ“ Abstract
We propose a provably correct Monte Carlo tree search (MCTS) algorithm for solving risk-aware Markov decision processes (MDPs) with entropic risk measure (ERM) objectives. We provide a non-asymptotic analysis of our proposed algorithm, showing that the algorithm: (i) is correct in the sense that the empirical ERM obtained at the root node converges to the optimal ERM; and (ii) enjoys polynomial regret concentration. Our algorithm successfully exploits the dynamic programming formulations for solving risk-aware MDPs with ERM objectives introduced by previous works in the context of an upper confidence bound-based tree search algorithm. Finally, we provide a set of illustrative experiments comparing our risk-aware MCTS method against relevant baselines.
Problem

Research questions and friction points this paper is trying to address.

risk-aware
Markov decision processes
entropic risk measure
Monte Carlo tree search
Innovation

Methods, ideas, or system contributions that make the work stand out.

Entropic Risk Measure
Risk-Aware MDP
Monte Carlo Tree Search
Non-Asymptotic Analysis
Polynomial Regret Concentration
πŸ”Ž Similar Papers
No similar papers found.