SemiDAViL: Semi-supervised Domain Adaptation with Vision-Language Guidance for Semantic Segmentation

๐Ÿ“… 2025-04-08
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address category confusion due to label scarcity and insufficient learning of tail classes caused by long-tailed class distributions in semi-supervised domain adaptive (SSDA) semantic segmentation, this paper proposes a language-guided SSDA framework. Methodologically, it introduces pretrained vision-language models (VLMs) as cross-domain semantic bridges for the first time, integrating contrastive learning, confidence-aware pseudo-label refinement, and a gradient-balanced class-balanced segmentation loss to explicitly mitigate distribution shift and class bias. Key contributions include: (1) leveraging the open-vocabulary semantic generalization capability of VLMs to enhance discriminability for unseen target-domain categories; and (2) designing a gradient-balanced class-balanced loss to improve recall of tail classes. The framework achieves significant improvements over state-of-the-art methods on mainstream benchmarks (e.g., GTAโ†’Cityscapes), with mIoU gains of 3.2โ€“5.7 percentage points on fine-grained and visually similar categories.

Technology Category

Application Category

๐Ÿ“ Abstract
Domain Adaptation (DA) and Semi-supervised Learning (SSL) converge in Semi-supervised Domain Adaptation (SSDA), where the objective is to transfer knowledge from a source domain to a target domain using a combination of limited labeled target samples and abundant unlabeled target data. Although intuitive, a simple amalgamation of DA and SSL is suboptimal in semantic segmentation due to two major reasons: (1) previous methods, while able to learn good segmentation boundaries, are prone to confuse classes with similar visual appearance due to limited supervision; and (2) skewed and imbalanced training data distribution preferring source representation learning whereas impeding from exploring limited information about tailed classes. Language guidance can serve as a pivotal semantic bridge, facilitating robust class discrimination and mitigating visual ambiguities by leveraging the rich semantic relationships encoded in pre-trained language models to enhance feature representations across domains. Therefore, we propose the first language-guided SSDA setting for semantic segmentation in this work. Specifically, we harness the semantic generalization capabilities inherent in vision-language models (VLMs) to establish a synergistic framework within the SSDA paradigm. To address the inherent class-imbalance challenges in long-tailed distributions, we introduce class-balanced segmentation loss formulations that effectively regularize the learning process. Through extensive experimentation across diverse domain adaptation scenarios, our approach demonstrates substantial performance improvements over contemporary state-of-the-art (SoTA) methodologies. Code is available: href{https://github.com/hritam-98/SemiDAViL}{GitHub}.
Problem

Research questions and friction points this paper is trying to address.

Enhance semantic segmentation via vision-language guidance in SSDA
Address class confusion due to limited supervision in DA and SSL
Mitigate class-imbalance in long-tailed data distributions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages vision-language models for semantic bridging
Introduces class-balanced segmentation loss formulations
Combines domain adaptation with semi-supervised learning
๐Ÿ”Ž Similar Papers
No similar papers found.