🤖 AI Summary
In CLIP-based domain generalization, multi-source training induces sample conflicts (e.g., noisy samples and extreme domain shifts) and optimization conflicts (e.g., gradient competition and trade-offs among objectives). To address these challenges, this paper proposes the Harmonized Source Model Merging (HAM) framework. HAM achieves synergistic optimization through three key innovations: (1) conflict-free sample augmentation to mitigate data noise; (2) a gradient-direction harmonization mechanism that aligns parameter updates across source models; and (3) a redundancy-aware historical model fusion strategy that selectively weights and ensembles models to enhance knowledge complementarity. Crucially, HAM operates as a post-hoc merging method—requiring no joint training—and builds upon independently trained CLIP source models. Evaluated on five mainstream benchmarks, HAM consistently outperforms existing state-of-the-art methods, demonstrating that model merging significantly improves zero-shot cross-domain generalization capability.
📝 Abstract
CLIP-based domain generalization aims to improve model generalization to unseen domains by leveraging the powerful zero-shot classification capabilities of CLIP and multiple source datasets. Existing methods typically train a single model across multiple source domains to capture domain-shared information. However, this paradigm inherently suffers from two types of conflicts: 1) sample conflicts, arising from noisy samples and extreme domain shifts among sources; and 2) optimization conflicts, stemming from competition and trade-offs during multi-source training. Both hinder the generalization and lead to suboptimal solutions. Recent studies have shown that model merging can effectively mitigate the competition of multi-objective optimization and improve generalization performance. Inspired by these findings, we propose Harmonizing and Merging (HAM), a novel source model merging framework for CLIP-based domain generalization. During the training process of the source models, HAM enriches the source samples without conflicting samples, and harmonizes the update directions of all models. Then, a redundancy-aware historical model merging method is introduced to effectively integrate knowledge across all source models. HAM comprehensively consolidates source domain information while enabling mutual enhancement among source models, ultimately yielding a final model with optimal generalization capabilities. Extensive experiments on five widely used benchmark datasets demonstrate the effectiveness of our approach, achieving state-of-the-art performance.