Instance-Guided Unsupervised Domain Adaptation for Robotic Semantic Segmentation

πŸ“… 2026-02-01
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the significant performance degradation of robotic semantic segmentation when deployment environments diverge from training data distributions, a challenge exacerbated by the sensitivity of existing unsupervised domain adaptation (UDA) methods to cross-view instance-level inconsistencies. To overcome this limitation, the paper introduces, for the first time, the zero-shot instance segmentation capability of foundation models into a UDA framework. By leveraging 3D voxel maps constructed by robots, the method generates multi-view consistent pseudo-labels and refines their quality through instance-level consistency constraints, enabling self-supervised fine-tuning without target-domain annotations. Experiments on real-world datasets demonstrate that the proposed approach substantially outperforms state-of-the-art multi-view consistency methods, significantly enhancing model generalization and segmentation accuracy in the target domain.

Technology Category

Application Category

πŸ“ Abstract
Semantic segmentation networks, which are essential for robotic perception, often suffer from performance degradation when the visual distribution of the deployment environment differs from that of the source dataset on which they were trained. Unsupervised Domain Adaptation (UDA) addresses this challenge by adapting the network to the robot's target environment without external supervision, leveraging the large amounts of data a robot might naturally collect during long-term operation. In such settings, UDA methods can exploit multi-view consistency across the environment's map to fine-tune the model in an unsupervised fashion and mitigate domain shift. However, these approaches remain sensitive to cross-view instance-level inconsistencies. In this work, we propose a method that starts from a volumetric 3D map to generate multi-view consistent pseudo-labels. We then refine these labels using the zero-shot instance segmentation capabilities of a foundation model, enforcing instance-level coherence. The refined annotations serve as supervision for self-supervised fine-tuning, enabling the robot to adapt its perception system at deployment time. Experiments on real-world data demonstrate that our approach consistently improves performance over state-of-the-art UDA baselines based on multi-view consistency, without requiring any ground-truth labels in the target domain.
Problem

Research questions and friction points this paper is trying to address.

Unsupervised Domain Adaptation
Semantic Segmentation
Instance-level Consistency
Robotic Perception
Domain Shift
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unsupervised Domain Adaptation
Instance-level Consistency
3D Volumetric Map
Foundation Model
Zero-shot Instance Segmentation
πŸ”Ž Similar Papers
No similar papers found.
M
Michele Antonazzi
Department of Computer Science, University of Milan, Milano, Italy
L
Lorenzo Signorelli
Department of Computer Science, University of Milan, Milano, Italy
Matteo Luperto
Matteo Luperto
Assistant Professor, UniversitΓ  degli Studi di Milano
semantic mappingroboticsassistive robotics
Nicola Basilico
Nicola Basilico
University of Milan
RoboticsArtificial IntelligenceMultiagent systems