π€ AI Summary
Visual object tracking suffers significant performance degradation under adverse weather conditions (e.g., nighttime, fog) due to domain shift. To address this, we propose a unified training-free domain adaptation framework. Our method comprises three key components: (1) a controllable scene generator guided by text prompts to synthesize a small set of unlabeled multi-weather videos, alleviating real-data scarcity; (2) a lightweight Domain-Customized Adapter (DCA) enabling plug-and-play, rapid domain transfer without modifying the backbone; and (3) a Target-aware Confidence Alignment (TCA) module leveraging optimal transport theory to enhance cross-domain localization consistency. Crucially, our approach requires no fine-tuning of the backbone network or retraining. Evaluated on multiple adverse-weather tracking benchmarks, it substantially outperforms state-of-the-art methods, establishing new performance benchmarks and demonstrating strong generalization capability and engineering practicality.
π Abstract
Visual object tracking has gained promising progress in past decades. Most of the existing approaches focus on learning target representation in well-conditioned daytime data, while for the unconstrained real-world scenarios with adverse weather conditions, e.g. nighttime or foggy environment, the tremendous domain shift leads to significant performance degradation. In this paper, we propose UMDATrack, which is capable of maintaining high-quality target state prediction under various adverse weather conditions within a unified domain adaptation framework. Specifically, we first use a controllable scenario generator to synthesize a small amount of unlabeled videos (less than 2% frames in source daytime datasets) in multiple weather conditions under the guidance of different text prompts. Afterwards, we design a simple yet effective domain-customized adapter (DCA), allowing the target objects' representation to rapidly adapt to various weather conditions without redundant model updating. Furthermore, to enhance the localization consistency between source and target domains, we propose a target-aware confidence alignment module (TCA) following optimal transport theorem. Extensive experiments demonstrate that UMDATrack can surpass existing advanced visual trackers and lead new state-of-the-art performance by a significant margin. Our code is available at https://github.com/Z-Z188/UMDATrack.