๐ค AI Summary
Multimodal learning commonly suffers from modality imbalanceโwhere dominant modalities monopolize optimization while weaker ones contribute insufficiently, thereby limiting collaborative representation learning. To address this, we propose a gradient-guided knowledge distillation framework coupled with a novel Dynamic Sequential Modality Prioritization (SMP) mechanism, the first to enable modalities to alternately assume optimization dominance during training, thereby enforcing substantial gradient participation from weaker modalities. We further design a customized loss function integrating unimodal supervision and multimodal alignment constraints. Evaluated across multiple real-world multimodal benchmarks, our method significantly enhances the contribution of weaker modalities and consistently outperforms state-of-the-art approaches on both classification and regression tasks. These results demonstrate its effectiveness and generalizability in modeling heterogeneous modalities collaboratively.
๐ Abstract
Multimodal learning aims to leverage information from diverse data modalities to achieve more comprehensive performance. However, conventional multimodal models often suffer from modality imbalance, where one or a few modalities dominate model optimization, leading to suboptimal feature representation and underutilization of weak modalities. To address this challenge, we introduce Gradient-Guided Distillation (G$^{2}$D), a knowledge distillation framework that optimizes the multimodal model with a custom-built loss function that fuses both unimodal and multimodal objectives. G$^{2}$D further incorporates a dynamic sequential modality prioritization (SMP) technique in the learning process to ensure each modality leads the learning process, avoiding the pitfall of stronger modalities overshadowing weaker ones. We validate G$^{2}$D on multiple real-world datasets and show that G$^{2}$D amplifies the significance of weak modalities while training and outperforms state-of-the-art methods in classification and regression tasks. Our code is available at https://github.com/rAIson-Lab/G2D.