🤖 AI Summary
This study addresses the limitations of traditional false belief tasks (FBTs) in evaluating social cognition in large language models, particularly due to data contamination and insufficient experimental control. By assessing 17 open-source models across 192 balanced FBT variants and employing Bayesian logistic regression combined with vector steering, the work systematically examines the effects of model scale, instruction tuning, and reasoning-focused fine-tuning. It reveals for the first time that explicit propositional attitude markers (e.g., “X believes”) significantly alter model response patterns—a behavior driven by a pretraining-acquired “think” vector, indicating that social reasoning tendencies emerge prior to fine-tuning. The findings show that while larger model scale generally improves performance, this relationship is non-monotonic; instruction tuning mitigates cross-task interference, whereas reasoning-focused fine-tuning exacerbates it.
📝 Abstract
The False Belief Test (FBT) has been the main method for assessing Theory of Mind (ToM) and related socio-cognitive competencies. For Large Language Models (LLMs), the reliability and explanatory potential of this test have remained limited due to issues like data contamination, insufficient model details, and inconsistent controls. We address these issues by testing 17 open-weight models on a balanced set of 192 FBT variants (Trott et al. 2023) using Bayesian Logistic regression to identify how model size and post-training affect socio-cognitive competence. We find that scaling model size benefits performance, but not strictly. A cross-over effect reveals that explicating propositional attitudes (X thinks) fundamentally alters response patterns. Instruction tuning partially mitigates this effect, but further reasoning-oriented finetuning amplifies it. In a case study analysing social reasoning ability throughout OLMo 2 training, we show that this cross-over effect emerges during pre-training, suggesting that models acquire stereotypical response patterns tied to mental-state vocabulary that can outweigh other scenario semantics. Finally, vector steering allows us to isolate a think vector as the causal driver of observed FBT behaviour.