🤖 AI Summary
This work addresses the challenge that large language models face in explicitly modeling nested beliefs in high-order Theory of Mind (ToM) reasoning, as existing approaches relying on event filtering or linear belief chains yield limited performance. The authors propose RecToM, a novel framework that introduces, for the first time, a recursive perspective-construction mechanism: during inference, it recursively generates character-specific perspectives, thereby reducing high-order belief reasoning to a reality-based problem from a final observer’s viewpoint. The method integrates recursive prompting, chains of character perspectives, and formal KD45 modal logic analysis to ensure theoretical rigor in belief representation. Evaluated across multiple ToM benchmarks—including Hi-ToM, Big-ToM, and FanToM—RecToM achieves state-of-the-art results, attaining 100% accuracy on Hi-ToM using both GPT-5.4 and Qwen3.5.
📝 Abstract
Theory of Mind (ToM) reasoning requires inferring agents' beliefs from partial and asymmetric observations, which remains an open challenge for LLMs. Existing prompting-based approaches improve ToM reasoning through observable-event filtering or temporal belief chains, without explicitly modeling nested beliefs. We introduce RecToM, an inference-time framework for ToM reasoning that models nested beliefs via recursive perspective construction. RecToM constructs each character perspective from the preceding character perspective along the character chain specified by the question, reducing higher-order belief questions to actual-world questions within the final constructed perspective. We further provide a KD45 analysis showing that RecToM's perspective construction induces a well-formed belief modality beyond simple event filtering. Experiments on ToM benchmarks, including Hi-ToM, Big-ToM, and FanToM, across multiple LLM backbones show that RecToM consistently outperforms recent advanced approaches, achieving state-of-the-art performance. Notably, RecToM reaches 100\% accuracy on Hi-ToM with GPT-5.4 and Qwen3.5, a benchmark requiring higher-order ToM reasoning.