Rethinking the Need for Source Models: Source-Free Domain Adaptation from Scratch Guided by a Vision-Language Model

๐Ÿ“… 2026-05-04
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

217K/year
๐Ÿค– AI Summary
This work addresses the persistent reliance on source-pretrained models in existing source-free domain adaptation (SFDA) methods, which implicitly retain dependence on the source domain. To overcome this limitation, we introduce a truly source-free settingโ€”VODAโ€”that requires no source model whatsoever, leveraging only a randomly initialized network, a vision-language model, and unlabeled target data for adaptation. We propose a two-stage denoising region distillation (TS-DRD) framework that exploits vision-language guidance to identify reliable target regions and progressively distills high-quality knowledge in two stages. Extensive experiments on Office-Home, VisDA, and DomainNet-126 demonstrate that our method matches or even surpasses the performance of conventional SFDA approaches that depend on source models, thereby achieving the first genuinely source-independent domain adaptation and revealing the surprisingly limited role of source models in current methods.
๐Ÿ“ Abstract
Source-Free Domain Adaptation (SFDA) adapts source models to target domains without accessing source data, addressing privacy and transmission issues. However, existing methods still initialize from a source pre-trained model and thus are not truly source-free. Recent works have introduced Vision-Language (ViL) models to guide the adaptation process, in these methods, we observe that for the same target domain, different source models yield minimal variation in final results, indicating the source model itself has limited impact. Motivated by this, we propose ViL-Only Domain Adaptation (VODA) , a stricter setting that eliminates all dependencies on source domain, relying solely on a randomly initialized model, a ViL model, and unlabeled target data. We analyze the adaptation dynamics of VODA and introduce Two-Stage Denoised-Region Distillation (TS-DRD) , a two-stage framework that first warms up the model with ViL guidance, then seek a Denoised-Region inherent in both the ViL and adapting model, yielding cleaner supervision for distillation. Experiments on Office-Home, VisDA, and DomainNet-126 show that under VODA, TS-DRD achieves competitive or superior performance to existing SFDA methods that still use source models, demonstrating its effectiveness and the potential of the VODA setting.
Problem

Research questions and friction points this paper is trying to address.

Source-Free Domain Adaptation
Vision-Language Model
Domain Adaptation
Source Model Dependency
Unsupervised Adaptation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Source-Free Domain Adaptation
Vision-Language Model
Model Initialization from Scratch
Knowledge Distillation
Denoised-Region
๐Ÿ”Ž Similar Papers
No similar papers found.