Detecting Emotional Dynamic Trajectories: An Evaluation Framework for Emotional Support in Language Models

📅 2025-11-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM evaluation for emotional support relies predominantly on static, short-turn dialogues, failing to capture the dynamic evolution and longitudinal nature of human emotions. To address this, we propose the first evaluation framework explicitly designed for emotional dynamic trajectories. Our method introduces a first-order Markov emotional trajectory model, integrating psychologically grounded mechanisms—including contextual selection and cognitive reappraisal—to generate a large-scale benchmark comprising 328 emotion scenarios and 1,152 distractor events. We design a trajectory-level evaluation paradigm featuring constraints on emotion regulation strategies and causal adjustment for emotional state tracking. Furthermore, we introduce three novel metrics: BEL (Emotional Baseline Shift), ETV (Emotional Trajectory Variance), and ECP (Empathic Consistency Probability). Extensive evaluation across diverse LLMs demonstrates that our framework effectively discriminates long-term emotional support capabilities, yielding interpretable and actionable assessments of empathic interaction.

Technology Category

Application Category

📝 Abstract
Emotional support is a core capability in human-AI interaction, with applications including psychological counseling, role play, and companionship. However, existing evaluations of large language models (LLMs) often rely on short, static dialogues and fail to capture the dynamic and long-term nature of emotional support. To overcome this limitation, we shift from snapshot-based evaluation to trajectory-based assessment, adopting a user-centered perspective that evaluates models based on their ability to improve and stabilize user emotional states over time. Our framework constructs a large-scale benchmark consisting of 328 emotional contexts and 1,152 disturbance events, simulating realistic emotional shifts under evolving dialogue scenarios. To encourage psychologically grounded responses, we constrain model outputs using validated emotion regulation strategies such as situation selection and cognitive reappraisal. User emotional trajectories are modeled as a first-order Markov process, and we apply causally-adjusted emotion estimation to obtain unbiased emotional state tracking. Based on this framework, we introduce three trajectory-level metrics: Baseline Emotional Level (BEL), Emotional Trajectory Volatility (ETV), and Emotional Centroid Position (ECP). These metrics collectively capture user emotional dynamics over time and support comprehensive evaluation of long-term emotional support performance of LLMs. Extensive evaluations across a diverse set of LLMs reveal significant disparities in emotional support capabilities and provide actionable insights for model development.
Problem

Research questions and friction points this paper is trying to address.

Evaluating emotional support in language models using dynamic trajectories instead of static snapshots
Capturing long-term emotional state improvements and stabilization through user-centered assessment
Addressing limitations of short dialogue evaluations with psychologically grounded response strategies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Trajectory-based emotional assessment using Markov modeling
Constraining responses with validated emotion regulation strategies
Introducing three trajectory-level metrics for comprehensive evaluation
🔎 Similar Papers
No similar papers found.
Z
Zhouxing Tan
National Engineering Research Center for Software Engineering, Peking University, Beijing, China
R
Ruochong Xiong
National Engineering Research Center for Software Engineering, Peking University, Beijing, China
Y
Yulong Wan
National Engineering Research Center for Software Engineering, Peking University, Beijing, China
J
Jinlong Ma
Guangzhou Quwan Network Technology, Guangzhou, China
H
Hanlin Xue
National Engineering Research Center for Software Engineering, Peking University, Beijing, China
Q
Qichun Deng
Guangzhou Quwan Network Technology, Guangzhou, China
H
Haifeng Jing
National Engineering Research Center for Software Engineering, Peking University, Beijing, China
Z
Zhengtong Zhang
Guangzhou Quwan Network Technology, Guangzhou, China
D
Depei Liu
National Engineering Research Center for Software Engineering, Peking University, Beijing, China
Shiyuan Luo
Shiyuan Luo
University of Pittsburgh
AI4ScienceMachine LearningData Mining
J
Junfei Liu
National Engineering Research Center for Software Engineering, Peking University, Beijing, China