Rethinking the Need for Source Models: Source-Free Domain Adaptation from Scratch Guided by a Vision-Language Model

📅 2026-05-04

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

This work addresses the persistent reliance on source-pretrained models in existing source-free domain adaptation (SFDA) methods, which implicitly retain dependence on the source domain. To overcome this limitation, we introduce a truly source-free setting—VODA—that requires no source model whatsoever, leveraging only a randomly initialized network, a vision-language model, and unlabeled target data for adaptation. We propose a two-stage denoising region distillation (TS-DRD) framework that exploits vision-language guidance to identify reliable target regions and progressively distills high-quality knowledge in two stages. Extensive experiments on Office-Home, VisDA, and DomainNet-126 demonstrate that our method matches or even surpasses the performance of conventional SFDA approaches that depend on source models, thereby achieving the first genuinely source-independent domain adaptation and revealing the surprisingly limited role of source models in current methods.

📝 Abstract

Source-Free Domain Adaptation (SFDA) adapts source models to target domains without accessing source data, addressing privacy and transmission issues. However, existing methods still initialize from a source pre-trained model and thus are not truly source-free. Recent works have introduced Vision-Language (ViL) models to guide the adaptation process, in these methods, we observe that for the same target domain, different source models yield minimal variation in final results, indicating the source model itself has limited impact. Motivated by this, we propose ViL-Only Domain Adaptation (VODA) , a stricter setting that eliminates all dependencies on source domain, relying solely on a randomly initialized model, a ViL model, and unlabeled target data. We analyze the adaptation dynamics of VODA and introduce Two-Stage Denoised-Region Distillation (TS-DRD) , a two-stage framework that first warms up the model with ViL guidance, then seek a Denoised-Region inherent in both the ViL and adapting model, yielding cleaner supervision for distillation. Experiments on Office-Home, VisDA, and DomainNet-126 show that under VODA, TS-DRD achieves competitive or superior performance to existing SFDA methods that still use source models, demonstrating its effectiveness and the potential of the VODA setting.

Problem

Research questions and friction points this paper is trying to address.

Source-Free Domain Adaptation

Vision-Language Model

Domain Adaptation

Source Model Dependency

Unsupervised Adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Source-Free Domain Adaptation

Vision-Language Model

Model Initialization from Scratch