🤖 AI Summary
Current multimodal sentiment analysis (MSA) methods suffer from poor generalization under modality missing—caused by occlusion, privacy constraints, or device failure. To address this, we propose a factorization-guided semantic recovery framework. Our method introduces: (1) a redundancy-free homogeneous–heterogeneous factorization module that disentangles cross-modal shared semantics from modality-specific representations while suppressing noise; and (2) a distribution-aligned self-distillation mechanism enabling bidirectional knowledge transfer and robust reconstruction of missing modalities. Integrating factorization networks, representation-constrained learning, and distribution alignment, the framework achieves state-of-the-art performance on CMU-MOSEI and CH-SIMS. Notably, it demonstrates significant gains under uncertain modality missing scenarios, validating both its effectiveness and strong generalization capability.
📝 Abstract
In recent years, Multimodal Sentiment Analysis (MSA) has become a research hotspot that aims to utilize multimodal data for human sentiment understanding. Previous MSA studies have mainly focused on performing interaction and fusion on complete multimodal data, ignoring the problem of missing modalities in real-world applications due to occlusion, personal privacy constraints, and device malfunctions, resulting in low generalizability.
To this end, we propose a Factorization-guided Semantic Recovery Framework (FSRF) to mitigate the modality missing problem in the MSA task.
Specifically, we propose a de-redundant homo-heterogeneous factorization module that factorizes modality into modality-homogeneous, modality-heterogeneous, and noisy representations and design elaborate constraint paradigms for representation learning.
Furthermore, we design a distribution-aligned self-distillation module that fully recovers the missing semantics by utilizing bidirectional knowledge transfer.
Comprehensive experiments on two datasets indicate that FSRF has a significant performance advantage over previous methods with uncertain missing modalities.