Modality-Balanced Collaborative Distillation for Multi-Modal Domain Generalization

📅 2025-11-25

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

In multimodal domain generalization (MMDG), weight averaging (WA) suffers from modality bias and degraded generalization due to disparate optimization speeds across modalities. To address this, we propose Modality-Balanced Collaborative Distillation (MB-CD). MB-CD mitigates convergence imbalance via adaptive modality dropout, enforces gradient consistency to enhance cross-modal optimization coordination, and—novelly—integrates WA into a multi-branch collaborative distillation pipeline, jointly modeling cross-modal knowledge transfer and flatness-aware optimization. Its core innovation lies in unifying distillation and ensemble learning through WA, thereby guiding the model toward flatter, more generalizable optima. Extensive experiments on multiple MMDG benchmarks demonstrate that MB-CD significantly improves cross-domain accuracy and robustness over state-of-the-art methods.

Technology Category

Application Category

📝 Abstract

Weight Averaging (WA) has emerged as a powerful technique for enhancing generalization by promoting convergence to a flat loss landscape, which correlates with stronger out-of-distribution performance. However, applying WA directly to multi-modal domain generalization (MMDG) is challenging: differences in optimization speed across modalities lead WA to overfit to faster-converging ones in early stages, suppressing the contribution of slower yet complementary modalities, thereby hindering effective modality fusion and skewing the loss surface toward sharper, less generalizable minima. To address this issue, we propose MBCD, a unified collaborative distillation framework that retains WA's flatness-inducing advantages while overcoming its shortcomings in multi-modal contexts. MBCD begins with adaptive modality dropout in the student model to curb early-stage bias toward dominant modalities. A gradient consistency constraint then aligns learning signals between uni-modal branches and the fused representation, encouraging coordinated and smoother optimization. Finally, a WA-based teacher conducts cross-modal distillation by transferring fused knowledge to each uni-modal branch, which strengthens cross-modal interactions and steer convergence toward flatter solutions. Extensive experiments on MMDG benchmarks show that MBCD consistently outperforms existing methods, achieving superior accuracy and robustness across diverse unseen domains.

Problem

Research questions and friction points this paper is trying to address.

Addresses modality imbalance in multi-modal domain generalization optimization

Prevents overfitting to faster-converging modalities during weight averaging

Enhances cross-modal fusion and flat loss landscape for generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive modality dropout prevents early bias

Gradient consistency aligns uni-modal and fused learning

WA-based teacher enables cross-modal knowledge distillation

🔎 Similar Papers

No similar papers found.