Latent Domain Prompt Learning for Vision-Language Models

📅 2025-10-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses domain generalization (DG) for vision-language models (VLMs) in the absence of explicit domain labels. The proposed method enables unsupervised latent domain modeling and adaptive transfer without domain supervision. Methodologically, it (1) introduces image-feature-driven automatic clustering of latent domains to implicitly characterize unknown target-domain distributions, and (2) designs a cross-modal similarity-guided text feature fusion mechanism to support domain-aware prompt learning. By tightly coupling visual and linguistic representations within an alignment framework, the approach jointly discovers domain structure and transfers knowledge—entirely without domain annotations. Evaluated on four standard DG benchmarks, the method consistently outperforms existing VLM-based baselines, demonstrating superior robustness and generalization under domain shift.

Technology Category

Application Category

📝 Abstract
The objective of domain generalization (DG) is to enable models to be robust against domain shift. DG is crucial for deploying vision-language models (VLMs) in real-world applications, yet most existing methods rely on domain labels that may not be available and often ambiguous. We instead study the DG setting where models must generalize well without access to explicit domain labels. Our key idea is to represent an unseen target domain as a combination of latent domains automatically discovered from training data, enabling the model to adaptively transfer knowledge across domains. To realize this, we perform latent domain clustering on image features and fuse domain-specific text features based on the similarity between the input image and each latent domain. Experiments on four benchmarks show that this strategy yields consistent gains over VLM-based baselines and provides new insights into improving robustness under domain shift.
Problem

Research questions and friction points this paper is trying to address.

Generalize vision-language models without domain labels
Represent unseen domains via latent domain combinations
Adaptively transfer knowledge across discovered latent domains
Innovation

Methods, ideas, or system contributions that make the work stand out.

Discovers latent domains via clustering image features
Fuses domain-specific text features using similarity
Enables adaptive knowledge transfer across latent domains
🔎 Similar Papers
No similar papers found.
Zhixing Li
Zhixing Li
Ph.D candidate, NUDT
empirical software engineeringsocial coding
A
Arsham Gholamzadeh Khoee
Chalmers University of Technology, Department of Computer Science and Engineering, Gothenburg, Sweden
Y
Yinan Yu
Chalmers University of Technology, Department of Computer Science and Engineering, Gothenburg, Sweden