🤖 AI Summary
This study investigates the controllability of affective expression—specifically arousal and valence—in long-horizon, multi-turn dialogues generated by large language models (LLMs). Addressing the lack of systematic quantitative analysis in prior work, we propose a novel LLM-driven affective analysis framework integrated with multi-agent dialogue simulation: (1) generating multi-turn dialogue trajectories using open-source LLMs; (2) leveraging LLMs for self-supervised affective annotation; and (3) modeling and statistically analyzing affective dynamics within the arousal–valence space. Our experiments, the first of their kind, systematically reveal significant inter-model disparities—and shared bottlenecks—across mainstream LLMs in affective stability, capacity to generate extreme emotions, and contextual affective consistency (e.g., affective drift). The findings establish a reproducible evaluation benchmark and provide empirical grounding for designing affectively controllable LLMs.
📝 Abstract
This paper investigates the challenges of affect control in large language models (LLMs), focusing on their ability to express appropriate emotional states during extended dialogues. We evaluated state-of-the-art open-weight LLMs to assess their affective expressive range in terms of arousal and valence. Our study employs a novel methodology combining LLM-based sentiment analysis with multiturn dialogue simulations between LLMs. We quantify the models' capacity to express a wide spectrum of emotions and how they fluctuate during interactions. Our findings reveal significant variations among LLMs in their ability to maintain consistent affect, with some models demonstrating more stable emotional trajectories than others. Furthermore, we identify key challenges in affect control, including difficulties in producing and maintaining extreme emotional states and limitations in adapting affect to changing conversational contexts. These findings have important implications for the development of more emotionally intelligent AI systems and highlight the need for improved affect modelling in LLMs.