🤖 AI Summary
This paper challenges the prevailing assumption in AI theory of mind (ToM) research that behavioral mimicry constitutes evidence of genuine mental modeling. It argues that large language models’ “human-level” performance on standard ToM tasks reflects statistical pattern matching rather than attribution of internal mental states; more fundamentally, evaluating AI using paradigms derived from individual human cognition entails a methodological flaw. To address this, the paper introduces *Reciprocal Theory of Mind*, a novel framework that shifts evaluation from isolated, unidirectional AI output to bidirectional understanding generation within real-time human–AI interaction. Methodologically, it integrates behavioral experiments with LLMs, critical analysis of cognitive science theories, and philosophical conceptual clarification. Its core contributions are: (1) deconstructing the implicit cognitive assumptions underlying current ToM assessments; (2) exposing the irreducible epistemic gap between observable behavior and subjective experience; and (3) advancing AI cognitive science toward a dynamic, interaction-centered paradigm.
📝 Abstract
When researchers claim AI systems possess ToM or mental models, they are fundamentally dis- cussing behavioral predictions and bias corrections rather than genuine mental states. This position paper argues that the current discourse conflates sophisticated pattern matching with authentic cog- nition, missing a crucial distinction between simulation and experience. While recent studies show LLMs achieving human-level performance on ToM laboratory tasks, these results are based only on behavioral mimicry. More importantly, the entire testing paradigm may be flawed in applying individual human cognitive tests to AI systems, but assessing human cognition directly in the moment of human-AI interaction. I suggest shifting focus toward mutual ToM frameworks that acknowledge the simultaneous contributions of human cognition and AI algorithms, emphasizing the interaction dynamics, instead of testing AI in isolation.