🤖 AI Summary
This paper addresses two key challenges in multimodal sentiment recognition: (1) difficulty in integrating trustworthy prior knowledge, and (2) severe class imbalance. To tackle these, we propose a balanced dual-contrastive learning framework guided by trustworthy reasoning trajectories. Methodologically: (1) We leverage Gemini to generate fine-grained, modality-separable reasoning trajectories as trustworthy priors, and inject modality-specific reasoning cues into cross-modal interactions via a lightweight fusion network; (2) We design a balanced dual-contrastive loss that jointly optimizes inter-class discriminability and intra-class compactness, effectively mitigating long-tail distribution issues. Evaluated on the MER2024 benchmark, our approach achieves significant performance gains over state-of-the-art methods. Ablation studies and cross-dataset experiments further demonstrate the effectiveness and generalizability of our trustworthy prior modeling and domain-adaptive fusion mechanism.
📝 Abstract
This study investigates the integration of trustworthy prior reasoning knowledge from MLLMs into multimodal emotion recognition. We employ Gemini to generate fine-grained, modality-separable reasoning traces, which are injected as priors during the fusion stage to enrich cross-modal interactions. To mitigate the pronounced class-imbalance in multimodal emotion recognition, we introduce Balanced Dual-Contrastive Learning, a loss formulation that jointly balances inter-class and intra-class distributions. Applied to the MER2024 benchmark, our prior-enhanced framework yields substantial performance gains, demonstrating that the reliability of MLLM-derived reasoning can be synergistically combined with the domain adaptability of lightweight fusion networks for robust, scalable emotion recognition.