🤖 AI Summary
This work addresses fine-grained sentiment understanding in multimodal dialogues, tackling two core challenges: (1) joint extraction of cross-speaker sentiment hexad elements—holder, target, aspect, opinion, sentiment, and rationale—and (2) precise detection of sentiment reversal, i.e., dynamic sentiment shifts and their triggering causes. We propose a structured stepwise prompting mechanism to guide large language models (LLMs) through hierarchical sentiment element parsing. Additionally, we design a multi-LLM complementary ensemble framework that integrates multimodal contextual modeling with sequential component analysis to capture sentiment dynamics. Experiments demonstrate state-of-the-art performance: 47.38% average F1 for hexad extraction and 74.12% exact-match F1 for sentiment reversal detection—substantially outperforming existing baselines. To our knowledge, this is the first work to systematically address structured sentiment evolution modeling in multimodal dialogues.
📝 Abstract
Understanding sentiment in multimodal conversations is a complex yet crucial challenge toward building emotionally intelligent AI systems. The Multimodal Conversational Aspect-based Sentiment Analysis (MCABSA) Challenge invited participants to tackle two demanding subtasks: (1) extracting a comprehensive sentiment sextuple, including holder, target, aspect, opinion, sentiment, and rationale from multi-speaker dialogues, and (2) detecting sentiment flipping, which detects dynamic sentiment shifts and their underlying triggers. For Subtask-I, in the present paper, we designed a structured prompting pipeline that guided large language models (LLMs) to sequentially extract sentiment components with refined contextual understanding. For Subtask-II, we further leveraged the complementary strengths of three LLMs through ensembling to robustly identify sentiment transitions and their triggers. Our system achieved a 47.38% average score on Subtask-I and a 74.12% exact match F1 on Subtask-II, showing the effectiveness of step-wise refinement and ensemble strategies in rich, multimodal sentiment analysis tasks.