🤖 AI Summary
Existing emotion recognition datasets predominantly rely on explicit emotional cues, limiting their ability to assess models’ capacity for implicit contextual emotion reasoning. To address this, we propose the first Theory of Mind (ToM)-grounded emotion evaluation dataset supporting bidirectional inference—forward (situation → emotion) and backward (emotion → situation)—informed by ToM and cognitive appraisal theory. We systematically integrate cognitive appraisal dimensions into LLM evaluation frameworks for the first time, combining psychology-driven data sampling with human-curated fine-grained annotations to design controllable, interpretable reasoning tasks. Empirical evaluation reveals that while mainstream LLMs exhibit rudimentary emotion reasoning, they consistently fail to accurately map situational outcomes to emotions and underlying appraisal dimensions (e.g., goal relevance, agency, fairness). Our work demonstrates the necessity and efficacy of grounding LLM emotion intelligence assessment in cognitive theories to enhance scientific rigor and diagnostic precision.
📝 Abstract
Datasets used for emotion recognition tasks typically contain overt cues that can be used in predicting the emotions expressed in a text. However, one challenge is that texts sometimes contain covert contextual cues that are rich in affective semantics, which warrant higher-order reasoning abilities to infer emotional states, not simply the emotions conveyed. This study advances beyond surface-level perceptual features to investigate how large language models (LLMs) reason about others' emotional states using contextual information, within a Theory-of-Mind (ToM) framework. Grounded in Cognitive Appraisal Theory, we curate a specialized ToM evaluation dataset1 to assess both forward reasoning - from context to emotion- and backward reasoning - from emotion to inferred context. We showed that LLMs can reason to a certain extent, although they are poor at associating situational outcomes and appraisals with specific emotions. Our work highlights the need for psychological theories in the training and evaluation of LLMs in the context of emotion reasoning.