Entropic Risk-Aware Monte Carlo Tree Search

📅 2026-01-25

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This work addresses the planning problem in risk-sensitive Markov decision processes (MDPs) with entropy-regularized risk measures (ERM) as the optimization objective. The authors propose a Monte Carlo tree search (MCTS) algorithm that integrates upper confidence bound (UCB) strategies with dynamic programming. Notably, this is the first work to establish non-asymptotic correctness guarantees and polynomial regret bounds for risk-sensitive MCTS, rigorously proving that the empirical ERM at the root node converges to the optimal value. Both theoretical analysis and empirical evaluations demonstrate that the proposed algorithm outperforms existing baselines in risk-sensitive decision-making tasks, achieving a strong balance between theoretical rigor and practical effectiveness.

Technology Category

Application Category

📝 Abstract

We propose a provably correct Monte Carlo tree search (MCTS) algorithm for solving risk-aware Markov decision processes (MDPs) with entropic risk measure (ERM) objectives. We provide a non-asymptotic analysis of our proposed algorithm, showing that the algorithm: (i) is correct in the sense that the empirical ERM obtained at the root node converges to the optimal ERM; and (ii) enjoys polynomial regret concentration. Our algorithm successfully exploits the dynamic programming formulations for solving risk-aware MDPs with ERM objectives introduced by previous works in the context of an upper confidence bound-based tree search algorithm. Finally, we provide a set of illustrative experiments comparing our risk-aware MCTS method against relevant baselines.

Problem

Research questions and friction points this paper is trying to address.

risk-aware

Markov decision processes

entropic risk measure

Monte Carlo tree search

Innovation

Methods, ideas, or system contributions that make the work stand out.

Entropic Risk Measure

Risk-Aware MDP

Monte Carlo Tree Search

Non-Asymptotic Analysis