Decoding the Flow: CauseMotion for Emotional Causality Analysis in Long-form Conversations

📅 2025-01-01

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

To address the challenges of modeling emotional evolution and causal reasoning in long dialogues, this paper proposes CauseMotion—a novel multimodal Retrieval-Augmented Generation (RAG) framework that jointly leverages textual and acoustic features (e.g., vocal affect, intensity, speech rate). It introduces a sliding-window retrieval mechanism enabling causal chain modeling across dozens of dialogue turns. Furthermore, we construct the first fine-grained emotional causality benchmark dataset featuring dialogues exceeding 70 turns. On our proprietary dataset, CauseMotion-GLM-4 achieves a 1.2-percentage-point relative improvement and an 8.7-percentage-point absolute gain in causal accuracy over GPT-4o. On the DiaASQ benchmark, it attains state-of-the-art performance in accuracy, F1-score, and causal reasoning accuracy. The core innovations lie in (1) multimodal causal-aware RAG and (2) a long-range sliding-window retrieval mechanism tailored for extended conversational contexts.

Technology Category

Application Category

📝 Abstract

Long-sequence causal reasoning seeks to uncover causal relationships within extended time series data but is hindered by complex dependencies and the challenges of validating causal links. To address the limitations of large-scale language models (e.g., GPT-4) in capturing intricate emotional causality within extended dialogues, we propose CauseMotion, a long-sequence emotional causal reasoning framework grounded in Retrieval-Augmented Generation (RAG) and multimodal fusion. Unlike conventional methods relying only on textual information, CauseMotion enriches semantic representations by incorporating audio-derived features-vocal emotion, emotional intensity, and speech rate-into textual modalities. By integrating RAG with a sliding window mechanism, it effectively retrieves and leverages contextually relevant dialogue segments, thus enabling the inference of complex emotional causal chains spanning multiple conversational turns. To evaluate its effectiveness, we constructed the first benchmark dataset dedicated to long-sequence emotional causal reasoning, featuring dialogues with over 70 turns. Experimental results demonstrate that the proposed RAG-based multimodal integrated approach, the efficacy of substantially enhances both the depth of emotional understanding and the causal inference capabilities of large-scale language models. A GLM-4 integrated with CauseMotion achieves an 8.7% improvement in causal accuracy over the original model and surpasses GPT-4o by 1.2%. Additionally, on the publicly available DiaASQ dataset, CauseMotion-GLM-4 achieves state-of-the-art results in accuracy, F1 score, and causal reasoning accuracy.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Emotional Causality

Longitudinal Data Complexity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-Enhanced Generation (RAG)

Multimodal Information Fusion

Emotional Causal Inference

🔎 Similar Papers

No similar papers found.