G$^{2}$D: Boosting Multimodal Learning with Gradient-Guided Distillation

๐Ÿ“… 2025-06-26
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Multimodal learning commonly suffers from modality imbalanceโ€”where dominant modalities monopolize optimization while weaker ones contribute insufficiently, thereby limiting collaborative representation learning. To address this, we propose a gradient-guided knowledge distillation framework coupled with a novel Dynamic Sequential Modality Prioritization (SMP) mechanism, the first to enable modalities to alternately assume optimization dominance during training, thereby enforcing substantial gradient participation from weaker modalities. We further design a customized loss function integrating unimodal supervision and multimodal alignment constraints. Evaluated across multiple real-world multimodal benchmarks, our method significantly enhances the contribution of weaker modalities and consistently outperforms state-of-the-art approaches on both classification and regression tasks. These results demonstrate its effectiveness and generalizability in modeling heterogeneous modalities collaboratively.

Technology Category

Application Category

๐Ÿ“ Abstract
Multimodal learning aims to leverage information from diverse data modalities to achieve more comprehensive performance. However, conventional multimodal models often suffer from modality imbalance, where one or a few modalities dominate model optimization, leading to suboptimal feature representation and underutilization of weak modalities. To address this challenge, we introduce Gradient-Guided Distillation (G$^{2}$D), a knowledge distillation framework that optimizes the multimodal model with a custom-built loss function that fuses both unimodal and multimodal objectives. G$^{2}$D further incorporates a dynamic sequential modality prioritization (SMP) technique in the learning process to ensure each modality leads the learning process, avoiding the pitfall of stronger modalities overshadowing weaker ones. We validate G$^{2}$D on multiple real-world datasets and show that G$^{2}$D amplifies the significance of weak modalities while training and outperforms state-of-the-art methods in classification and regression tasks. Our code is available at https://github.com/rAIson-Lab/G2D.
Problem

Research questions and friction points this paper is trying to address.

Addressing modality imbalance in multimodal learning models
Enhancing weak modalities' contribution during model training
Improving multimodal feature representation and performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Gradient-Guided Distillation for multimodal learning
Dynamic sequential modality prioritization technique
Custom loss function fusing uni and multimodal objectives
๐Ÿ”Ž Similar Papers
No similar papers found.