🤖 AI Summary
Household service robots face challenges in executing ambiguous instructions, localizing occluded objects, and planning over open-vocabulary object categories in real-world environments. Method: We propose a novel hierarchical framework integrating large language models (LLMs) with Bayesian partially observable Markov decision processes (POMDPs). It combines LLM prompting, particle filtering, hierarchical hypothesis generation, open-domain POMDP modeling, Bayesian belief updating, and Monte Carlo tree search. Contribution/Results: Our key innovations are (1) the “Tree of Hypotheses” (TOH) mechanism—first introducing structured, verifiable particle-based beliefs guided by LLMs—and (2) the first computationally tractable POMDP framework supporting open-world state spaces, enabling falsifiable and scalable belief tracking and planning. Evaluated on multi-kitchen object rearrangement tasks, our method significantly outperforms state-of-the-art LLM-only and LLM-tree hybrid approaches, demonstrating superior robustness to ambiguity and occlusion, as well as higher planning efficiency.
📝 Abstract
Task planning under uncertainty is essential for home-service robots operating in the real world. Tasks involve ambiguous human instructions, hidden or unknown object locations, and open-vocabulary object types, leading to significant open-ended uncertainty and a boundlessly large planning space. To address these challenges, we propose Tru-POMDP, a planner that combines structured belief generation using Large Language Models (LLMs) with principled POMDP planning. Tru-POMDP introduces a hierarchical Tree of Hypotheses (TOH), which systematically queries an LLM to construct high-quality particle beliefs over possible world states and human goals. We further formulate an open-ended POMDP model that enables rigorous Bayesian belief tracking and efficient belief-space planning over these LLM-generated hypotheses. Experiments on complex object rearrangement tasks across diverse kitchen environments show that Tru-POMDP significantly outperforms state-of-the-art LLM-based and LLM-tree-search hybrid planners, achieving higher success rates with significantly better plans, stronger robustness to ambiguity and occlusion, and greater planning efficiency.