Language-Guided Instance-Aware Domain-Adaptive Panoptic Segmentation

📅 2024-04-04
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the co-degradation of semantic and instance segmentation in unsupervised domain adaptation (UDA) for panoptic segmentation—particularly where instance-level adaptation in the unlabeled target domain often harms semantic segmentation performance—this paper proposes LIDAPS. Our approach introduces three key innovations: (1) Instance-aware Mixed-domain Mixing (IMix), enabling fine-grained instance injection across domains; (2) CLIP-guided Domain Alignment (CDA), jointly aligning semantic and instance feature spaces via language-prior supervision; and (3) an end-to-end joint optimization framework. Evaluated on standard panoptic UDA benchmarks (e.g., Cityscapes→ACDC), LIDAPS achieves state-of-the-art performance, significantly improving the panoptic Quality Factor (QF). Notably, it is the first method to simultaneously achieve high-accuracy instance adaptation and robust semantic segmentation, demonstrating synergistic gains between the two subtasks.

Technology Category

Application Category

📝 Abstract
The increasing relevance of panoptic segmentation is tied to the advancements in autonomous driving and AR/VR applications. However, the deployment of such models has been limited due to the expensive nature of dense data annotation, giving rise to unsupervised domain adaptation (UDA). A key challenge in panoptic UDA is reducing the domain gap between a labeled source and an unlabeled target domain while harmonizing the subtasks of semantic and instance segmentation to limit catastrophic interference. While considerable progress has been achieved, existing approaches mainly focus on the adaptation of semantic segmentation. In this work, we focus on incorporating instance-level adaptation via a novel instance-aware cross-domain mixing strategy IMix. IMix significantly enhances the panoptic quality by improving instance segmentation performance. Specifically, we propose inserting high-confidence predicted instances from the target domain onto source images, retaining the exhaustiveness of the resulting pseudo-labels while reducing the injected confirmation bias. Nevertheless, such an enhancement comes at the cost of degraded semantic performance, attributed to catastrophic forgetting. To mitigate this issue, we regularize our semantic branch by employing CLIP-based domain alignment (CDA), exploiting the domain-robustness of natural language prompts. Finally, we present an end-to-end model incorporating these two mechanisms called LIDAPS, achieving state-of-the-art results on all popular panoptic UDA benchmarks.
Problem

Research questions and friction points this paper is trying to address.

Panoramic Segmentation
Unsupervised Domain Adaptation
Semantic-Instance Disentanglement
Innovation

Methods, ideas, or system contributions that make the work stand out.

IMix
CLIP
LIDAPS
🔎 Similar Papers
No similar papers found.