🤖 AI Summary
This paper identifies a critical yet overlooked issue in unsupervised domain adaptation (UDA) and unsupervised domain expansion (UDE): cross-domain visual ambiguity—i.e., certain samples appear visually closer to samples from a foreign domain than to those from their native domain, leading domain-specific models to misclassify them. To address this, we propose a Co-Teaching (CoT) framework—the first to explicitly model and exploit this ambiguity—where two teacher networks dynamically guide a student network through mutual interaction, balancing domain specificity and cross-domain generalization. We further introduce two variants: knowledge-distillation-based CoT (kdCT) and mixup-augmented CoT (miCT). Evaluated on four benchmarks spanning image classification and driving-scene semantic segmentation, our approach significantly improves target-domain accuracy while maintaining or even enhancing source-domain performance—thereby jointly optimizing both UDA and UDE objectives.
📝 Abstract
Unsupervised Domain Adaptation (UDA) essentially trades a model's performance on a source domain for improving its performance on a target domain. To resolve the issue, Unsupervised Domain Expansion (UDE) has been proposed recently. UDE tries to adapt the model for the target domain as UDA does, and in the meantime maintains its source-domain performance. In both UDA and UDE settings, a model tailored to a given domain, let it be the source or the target domain, is assumed to well handle samples from the given domain. We question the assumption by reporting the existence of cross-domain visual ambiguity: Given the lack of a crystally clear boundary between the two domains, samples from one domain can be visually close to the other domain. Such sorts of samples are typically in minority in their host domain, so they tend to be overlooked by the domain-specific model, but can be better handled by a model from the other domain. We exploit this finding, and accordingly propose Co-Teaching (CT). The CT method is instantiated with knowledge distillation based CT (kdCT) plus mixup based CT (miCT). Specifically, kdCT transfers knowledge from a leading-teacher network and an assistant-teacher network to a student network, so the cross-domain ambiguity will be better handled by the student. Meanwhile, miCT further enhances the generalization ability of the student. Extensive experiments on two image classification datasets and two driving-scene segmentation datasets justify the viability of CT for UDA and UDE.