SIDA: Synthetic Image Driven Zero-shot Domain Adaptation

📅 2025-07-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing zero-shot domain adaptation (ZSDA) methods rely on textual descriptions to model target-domain style, which poorly captures complex real-world distribution shifts and incurs high alignment overhead and long adaptation latency. To address these limitations, we propose a synthetic-image-driven ZSDA framework: target-style synthetic images are generated via image translation—replacing hand-crafted text prompts—and serve as explicit style references. We introduce two novel modules—Domain Mix and Patch Style Transfer—to enable multi-style fusion and fine-grained local style transfer, respectively. Crucially, style features are extracted and transferred within the CLIP embedding space to preserve semantic consistency. Our approach significantly enhances modeling capability for severe domain shifts, achieving state-of-the-art performance across multiple ZSDA benchmarks—especially under challenging domain gaps—while reducing adaptation time. The method thus advances both efficiency and generalization in zero-shot domain adaptation.

Technology Category

Application Category

📝 Abstract
Zero-shot domain adaptation is a method for adapting a model to a target domain without utilizing target domain image data. To enable adaptation without target images, existing studies utilize CLIP's embedding space and text description to simulate target-like style features. Despite the previous achievements in zero-shot domain adaptation, we observe that these text-driven methods struggle to capture complex real-world variations and significantly increase adaptation time due to their alignment process. Instead of relying on text descriptions, we explore solutions leveraging image data, which provides diverse and more fine-grained style cues. In this work, we propose SIDA, a novel and efficient zero-shot domain adaptation method leveraging synthetic images. To generate synthetic images, we first create detailed, source-like images and apply image translation to reflect the style of the target domain. We then utilize the style features of these synthetic images as a proxy for the target domain. Based on these features, we introduce Domain Mix and Patch Style Transfer modules, which enable effective modeling of real-world variations. In particular, Domain Mix blends multiple styles to expand the intra-domain representations, and Patch Style Transfer assigns different styles to individual patches. We demonstrate the effectiveness of our method by showing state-of-the-art performance in diverse zero-shot adaptation scenarios, particularly in challenging domains. Moreover, our approach achieves high efficiency by significantly reducing the overall adaptation time.
Problem

Research questions and friction points this paper is trying to address.

Adapting models to target domains without target images
Overcoming limitations of text-driven style feature simulation
Reducing adaptation time while capturing real-world variations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages synthetic images for domain adaptation
Uses Domain Mix to blend multiple styles
Applies Patch Style Transfer to individual patches
🔎 Similar Papers
No similar papers found.