Robust Multimodal Sentiment Analysis via Double Information Bottleneck

📅 2025-11-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing multimodal sentiment analysis methods suffer from two key limitations: sensitivity to unimodal noise and redundancy in cross-modal fusion. To address these, we propose a dual information bottleneck framework. First, a unimodal information bottleneck based on low-rank Rényi entropy compresses task-irrelevant noise while preserving discriminative features. Second, an attention-driven cross-modal information bottleneck dynamically selects complementary multimodal interactions and suppresses redundant fusion. The framework ensures computational tractability while significantly enhancing representation robustness and discriminability. Our method achieves state-of-the-art performance on CMU-MOSI (Acc-7: 47.4%) and CH-SIMS (F1: 81.63%, +1.19% relative improvement). Under strong artificial noise, performance degrades by only 0.29–0.36%, demonstrating superior noise resilience and generalization capability.

Technology Category

Application Category

📝 Abstract
Multimodal sentiment analysis has received significant attention across diverse research domains. Despite advancements in algorithm design, existing approaches suffer from two critical limitations: insufficient learning of noise-contaminated unimodal data, leading to corrupted cross-modal interactions, and inadequate fusion of multimodal representations, resulting in discarding discriminative unimodal information while retaining multimodal redundant information. To address these challenges, this paper proposes a Double Information Bottleneck (DIB) strategy to obtain a powerful, unified compact multimodal representation. Implemented within the framework of low-rank Renyi's entropy functional, DIB offers enhanced robustness against diverse noise sources and computational tractability for high-dimensional data, as compared to the conventional Shannon entropy-based methods. The DIB comprises two key modules: 1) learning a sufficient and compressed representation of individual unimodal data by maximizing the task-relevant information and discarding the superfluous information, and 2) ensuring the discriminative ability of multimodal representation through a novel attention bottleneck fusion mechanism. Consequently, DIB yields a multimodal representation that effectively filters out noisy information from unimodal data while capturing inter-modal complementarity. Extensive experiments on CMU-MOSI, CMU-MOSEI, CH-SIMS, and MVSA-Single validate the effectiveness of our method. The model achieves 47.4% accuracy under the Acc-7 metric on CMU-MOSI and 81.63% F1-score on CH-SIMS, outperforming the second-best baseline by 1.19%. Under noise, it shows only 0.36% and 0.29% performance degradation on CMU-MOSI and CMU-MOSEI respectively.
Problem

Research questions and friction points this paper is trying to address.

Addresses insufficient learning of noise-contaminated unimodal data
Solves inadequate fusion of multimodal representations discarding discriminative information
Enhances robustness against noise and computational tractability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Double Information Bottleneck strategy for robust multimodal representation
Maximizes task-relevant information while discarding superfluous unimodal data
Novel attention bottleneck fusion mechanism for inter-modal complementarity
🔎 Similar Papers
No similar papers found.
H
Huiting Huang
School of Computer Science and Technology, Xi’an Jiaotong University, 710049, China; Shaanxi Provincial Key Laboratory of Big Data Knowledge Engineering, Xi’an Jiaotong University, Xi’an, Shaanxi, 710049, China
Tieliang Gong
Tieliang Gong
Xi'an Jiaotong University
machine learningstatistical learning theoryinformation theory
K
Kai He
Saw Swee Hock School of Public Health, National University of Singapore, 119077, Singapore
Jialun Wu
Jialun Wu
School of Computer Science, Northwestern Polytechnical University, 710049, China
Erik Cambria
Erik Cambria
Professor @ NTU CCDS & Visiting @ MIT Media Lab
Neurosymbolic AIMultimodal InteractionNLPAffective ComputingSentiment Analysis
M
Mengling Feng
Saw Swee Hock School of Public Health, National University of Singapore, 119077, Singapore