Robust Multimodal Sentiment Analysis via Double Information Bottleneck

📅 2025-11-03

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing multimodal sentiment analysis methods suffer from two key limitations: sensitivity to unimodal noise and redundancy in cross-modal fusion. To address these, we propose a dual information bottleneck framework. First, a unimodal information bottleneck based on low-rank Rényi entropy compresses task-irrelevant noise while preserving discriminative features. Second, an attention-driven cross-modal information bottleneck dynamically selects complementary multimodal interactions and suppresses redundant fusion. The framework ensures computational tractability while significantly enhancing representation robustness and discriminability. Our method achieves state-of-the-art performance on CMU-MOSI (Acc-7: 47.4%) and CH-SIMS (F1: 81.63%, +1.19% relative improvement). Under strong artificial noise, performance degrades by only 0.29–0.36%, demonstrating superior noise resilience and generalization capability.

Technology Category

Application Category

📝 Abstract

Multimodal sentiment analysis has received significant attention across diverse research domains. Despite advancements in algorithm design, existing approaches suffer from two critical limitations: insufficient learning of noise-contaminated unimodal data, leading to corrupted cross-modal interactions, and inadequate fusion of multimodal representations, resulting in discarding discriminative unimodal information while retaining multimodal redundant information. To address these challenges, this paper proposes a Double Information Bottleneck (DIB) strategy to obtain a powerful, unified compact multimodal representation. Implemented within the framework of low-rank Renyi's entropy functional, DIB offers enhanced robustness against diverse noise sources and computational tractability for high-dimensional data, as compared to the conventional Shannon entropy-based methods. The DIB comprises two key modules: 1) learning a sufficient and compressed representation of individual unimodal data by maximizing the task-relevant information and discarding the superfluous information, and 2) ensuring the discriminative ability of multimodal representation through a novel attention bottleneck fusion mechanism. Consequently, DIB yields a multimodal representation that effectively filters out noisy information from unimodal data while capturing inter-modal complementarity. Extensive experiments on CMU-MOSI, CMU-MOSEI, CH-SIMS, and MVSA-Single validate the effectiveness of our method. The model achieves 47.4% accuracy under the Acc-7 metric on CMU-MOSI and 81.63% F1-score on CH-SIMS, outperforming the second-best baseline by 1.19%. Under noise, it shows only 0.36% and 0.29% performance degradation on CMU-MOSI and CMU-MOSEI respectively.

Problem

Research questions and friction points this paper is trying to address.

Addresses insufficient learning of noise-contaminated unimodal data

Solves inadequate fusion of multimodal representations discarding discriminative information

Enhances robustness against noise and computational tractability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Double Information Bottleneck strategy for robust multimodal representation

Maximizes task-relevant information while discarding superfluous unimodal data

Novel attention bottleneck fusion mechanism for inter-modal complementarity

🔎 Similar Papers

No similar papers found.

Authors to Follow