π€ AI Summary
This work addresses offline planning for deterministic partially observable Markov decision processes (DetPOMDPs)βa setting where the agentβs state is uncertain, yet both transitions and observations are deterministic. We propose the first framework adapting Monte Carlo value iteration (MCVI) to DetPOMDPs, explicitly leveraging determinism in dynamics and observations to construct compact, interpretable finite-state controllers (FSCs). Our method significantly improves solution success rates and robustness on large-scale DetPOMDPs. Empirically, it outperforms existing approaches on standard benchmarks and a real-world mobile-robot mapping task in forest environments, achieving both high success probability and strong generalization across unseen scenarios.
π Abstract
Deterministic partially observable Markov decision processes (DetPOMDPs) often arise in planning problems where the agent is uncertain about its environmental state but can act and observe deterministically. In this paper, we propose DetMCVI, an adaptation of the Monte Carlo Value Iteration (MCVI) algorithm for DetPOMDPs, which builds policies in the form of finite-state controllers (FSCs). DetMCVI solves large problems with a high success rate, outperforming existing baselines for DetPOMDPs. We also verify the performance of the algorithm in a real-world mobile robot forest mapping scenario.