๐ค AI Summary
To address category confusion due to label scarcity and insufficient learning of tail classes caused by long-tailed class distributions in semi-supervised domain adaptive (SSDA) semantic segmentation, this paper proposes a language-guided SSDA framework. Methodologically, it introduces pretrained vision-language models (VLMs) as cross-domain semantic bridges for the first time, integrating contrastive learning, confidence-aware pseudo-label refinement, and a gradient-balanced class-balanced segmentation loss to explicitly mitigate distribution shift and class bias. Key contributions include: (1) leveraging the open-vocabulary semantic generalization capability of VLMs to enhance discriminability for unseen target-domain categories; and (2) designing a gradient-balanced class-balanced loss to improve recall of tail classes. The framework achieves significant improvements over state-of-the-art methods on mainstream benchmarks (e.g., GTAโCityscapes), with mIoU gains of 3.2โ5.7 percentage points on fine-grained and visually similar categories.
๐ Abstract
Domain Adaptation (DA) and Semi-supervised Learning (SSL) converge in Semi-supervised Domain Adaptation (SSDA), where the objective is to transfer knowledge from a source domain to a target domain using a combination of limited labeled target samples and abundant unlabeled target data. Although intuitive, a simple amalgamation of DA and SSL is suboptimal in semantic segmentation due to two major reasons: (1) previous methods, while able to learn good segmentation boundaries, are prone to confuse classes with similar visual appearance due to limited supervision; and (2) skewed and imbalanced training data distribution preferring source representation learning whereas impeding from exploring limited information about tailed classes. Language guidance can serve as a pivotal semantic bridge, facilitating robust class discrimination and mitigating visual ambiguities by leveraging the rich semantic relationships encoded in pre-trained language models to enhance feature representations across domains. Therefore, we propose the first language-guided SSDA setting for semantic segmentation in this work. Specifically, we harness the semantic generalization capabilities inherent in vision-language models (VLMs) to establish a synergistic framework within the SSDA paradigm. To address the inherent class-imbalance challenges in long-tailed distributions, we introduce class-balanced segmentation loss formulations that effectively regularize the learning process. Through extensive experimentation across diverse domain adaptation scenarios, our approach demonstrates substantial performance improvements over contemporary state-of-the-art (SoTA) methodologies. Code is available: href{https://github.com/hritam-98/SemiDAViL}{GitHub}.