Robust Multi-modal Task-oriented Communications with Redundancy-aware Representations

📅 2025-11-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In multimodal semantic communication, significant intra- and inter-modal redundancy, coupled with channel distortion, severely degrades semantic fidelity. To address this, we propose a two-stage variational information bottleneck (VIB) framework synergistically optimized with adversarial mutual information minimization. Methodologically, we first integrate a dual-stage VIB—where the first stage performs modality-specific compression and the second enables cross-modal fusion—with an adversarial module that minimizes mutual information between modalities. This joint optimization preserves task-relevant semantics while simultaneously suppressing both intra-modal and inter-modal redundancy, thereby enhancing cross-modal complementarity and robustness to channel noise. Evaluated on multimodal sentiment recognition, our approach achieves substantial improvements in accuracy and semantic transmission reliability under low signal-to-noise ratios, outperforming existing single- and multimodal VIB methods as well as conventional mutual information minimization techniques.

Technology Category

Application Category

📝 Abstract
Semantic communications for multi-modal data can transmit task-relevant information efficiently over noisy and bandwidth-limited channels. However, a key challenge is to simultaneously compress inter-modal redundancy and improve semantic reliability under channel distortion. To address the challenge, we propose a robust and efficient multi-modal task-oriented communication framework that integrates a two-stage variational information bottleneck (VIB) with mutual information (MI) redundancy minimization. In the first stage, we apply uni-modal VIB to compress each modality separately, i.e., text, audio, and video, while preserving task-specific features. To enhance efficiency, an MI minimization module with adversarial training is then used to suppress cross-modal dependencies and to promote complementarity rather than redundancy. In the second stage, a multi-modal VIB is further used to compress the fused representation and to enhance robustness against channel distortion. Experimental results on multi-modal emotion recognition tasks demonstrate that the proposed framework significantly outperforms existing baselines in accuracy and reliability, particularly under low signal-to-noise ratio regimes. Our work provides a principled framework that jointly optimizes modality-specific compression, inter-modal redundancy, and communication reliability.
Problem

Research questions and friction points this paper is trying to address.

Compressing inter-modal redundancy in multi-modal semantic communications
Improving semantic reliability under channel distortion conditions
Optimizing task-specific feature transmission over bandwidth-limited channels
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage VIB compresses multimodal data separately
MI minimization reduces cross-modal redundancy via adversarial training
Multimodal VIB enhances robustness against channel distortion
🔎 Similar Papers
No similar papers found.