🤖 AI Summary
This work addresses the limitations in semi-supervised domain generalization caused by overreliance on pseudo-label accuracy, which restricts data utilization and model generalization. To mitigate this, the authors propose a vision-language model (VLM)-guided feature alignment approach that aligns intermediate model features with the semantically rich embedding space of a VLM to enhance domain invariance. The method integrates image-level augmentation and output-level regularization to improve data efficiency and alleviate overfitting. Notably, it is the first to leverage VLMs for semantic alignment in semi-supervised domain generalization, achieving high pseudo-label quality while maximizing the use of unlabeled data. Extensive experiments on four standard benchmarks demonstrate state-of-the-art performance, significantly outperforming existing methods.
📝 Abstract
Semi-supervised Domain Generalization (SSDG) addresses the challenge of generalizing to unseen target domains with limited labeled data. Existing SSDG methods highlight the importance of achieving high pseudo-labeling (PL) accuracy and preventing model overfitting as the main challenges in SSDG. In this light, we show that the SSDG literature's excessive focus on PL accuracy, without consideration for maximum data utilization during training, limits potential performance improvements. We propose a novel approach to the SSDG problem by aligning the intermediate features of our model with the semantically rich and generalized feature space of a Vision Language Model (VLM) in a way that promotes domain-invariance. The above approach is enhanced with effective image-level augmentation and output-level regularization strategies to improve data utilization and minimize overfitting. Extensive experimentation across four benchmarks against existing SSDG baselines suggests that our method achieves SOTA results both qualitatively and quantitatively. The code will be made publicly available.