Do Large Language Models Have a Planning Theory of Mind? Evidence from MindGames: a Multi-Step Persuasion Task

📅 2025-07-21

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study investigates whether large language models (LLMs) possess planning-capable Theory of Mind (PToM)—specifically, the ability to dynamically devise multi-step persuasive strategies by inferring others’ beliefs and preferences to influence behavior. Method: We introduce MindGames, the first PToM evaluation framework tailored to active psychological intervention scenarios, which orthogonally disentangles mental-state inference from pure sequential planning and employs rigorous controlled experiments. Contribution/Results: Behavioral and statistical comparisons between human participants and the o1-preview model reveal that humans significantly outperform o1-preview on PToM tasks requiring mental-state inference (+11%, p = 0.006), whereas o1-preview excels on planning-only tasks. These findings indicate that current LLMs lack human-like social reasoning mechanisms, underscoring the critical role of explicit psychological modeling in achieving higher-order social intelligence.

Technology Category

Application Category

📝 Abstract

Recent evidence suggests Large Language Models (LLMs) display Theory of Mind (ToM) abilities. Most ToM experiments place participants in a spectatorial role, wherein they predict and interpret other agents' behavior. However, human ToM also contributes to dynamically planning action and strategically intervening on others' mental states. We present MindGames: a novel `planning theory of mind' (PToM) task which requires agents to infer an interlocutor's beliefs and desires to persuade them to alter their behavior. Unlike previous evaluations, we explicitly evaluate use cases of ToM. We find that humans significantly outperform o1-preview (an LLM) at our PToM task (11% higher; $p=0.006$). We hypothesize this is because humans have an implicit causal model of other agents (e.g., they know, as our task requires, to ask about people's preferences). In contrast, o1-preview outperforms humans in a baseline condition which requires a similar amount of planning but minimal mental state inferences (e.g., o1-preview is better than humans at planning when already given someone's preferences). These results suggest a significant gap between human-like social reasoning and LLM abilities.

Problem

Research questions and friction points this paper is trying to address.

Assessing LLMs' ability to plan using Theory of Mind

Comparing human and LLM performance in multi-step persuasion

Evaluating LLMs' social reasoning versus human-like planning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel PToM task for multi-step persuasion

Explicit evaluation of ToM use cases

Comparison of human and LLM planning abilities

🔎 Similar Papers

No similar papers found.

Authors to Follow