🤖 AI Summary
Existing unsupervised domain adaptation (UDA) methods fail when source and target domains are fully heterogeneous modalities (e.g., RGB and LiDAR). To address this, we propose a new setting: Heterogeneous-Modality Unsupervised Domain Adaptation (HMUDA). To enable cross-modal knowledge transfer, we design a Latent-Space Bridging (LSB) dual-branch framework: one branch learns modality-invariant features, while the other jointly optimizes cross-modal feature consistency and cross-domain class-center alignment via a bridging domain. We further incorporate unsupervised semantic segmentation modeling to avoid pseudo-label noise. Our method achieves state-of-the-art performance on six heterogeneous-modality benchmarks, significantly improving cross-modal transfer accuracy. To the best of our knowledge, this is the first work to systematically formulate and solve the HMUDA problem—addressing both its modeling challenges and alignment difficulties in a unified framework.
📝 Abstract
Unsupervised domain adaptation (UDA) methods effectively bridge domain gaps but become struggled when the source and target domains belong to entirely distinct modalities. To address this limitation, we propose a novel setting called Heterogeneous-Modal Unsupervised Domain Adaptation (HMUDA), which enables knowledge transfer between completely different modalities by leveraging a bridge domain containing unlabeled samples from both modalities. To learn under the HMUDA setting, we propose Latent Space Bridging (LSB), a specialized framework designed for the semantic segmentation task. Specifically, LSB utilizes a dual-branch architecture, incorporating a feature consistency loss to align representations across modalities and a domain alignment loss to reduce discrepancies between class centroids across domains. Extensive experiments conducted on six benchmark datasets demonstrate that LSB achieves state-of-the-art performance.