Co-STAR: Collaborative Curriculum Self-Training with Adaptive Regularization for Source-Free Video Domain Adaptation

📅 2025-04-15

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

In source-free unsupervised video domain adaptation (SFUVDA), high pseudo-label noise and model overconfidence severely degrade cross-domain generalization when the source domain is unavailable. To address this, we propose a CLIP-augmented curriculum self-training framework guided by a teacher model. Our method jointly leverages collaborative self-training, contrastive vision-language priors from CLIP, and reliability-driven pseudo-label refinement. Key contributions include: (1) a novel bidirectional prediction alignment mechanism with reliability-weighted curriculum learning to improve pseudo-label quality; and (2) an adaptive curriculum regularization that dynamically suppresses both label noise and overconfidence bias by jointly measuring prediction confidence and stability. Extensive experiments on multiple video domain adaptation benchmarks demonstrate that our approach significantly outperforms existing source-free methods, achieving more robust and accurate cross-domain video recognition.

Technology Category

Application Category

📝 Abstract

Recent advances in Source-Free Unsupervised Video Domain Adaptation (SFUVDA) leverage vision-language models to enhance pseudo-label generation. However, challenges such as noisy pseudo-labels and over-confident predictions limit their effectiveness in adapting well across domains. We propose Co-STAR, a novel framework that integrates curriculum learning with collaborative self-training between a source-trained teacher and a contrastive vision-language model (CLIP). Our curriculum learning approach employs a reliability-based weight function that measures bidirectional prediction alignment between the teacher and CLIP, balancing between confident and uncertain predictions. This function preserves uncertainty for difficult samples, while prioritizing reliable pseudo-labels when the predictions from both models closely align. To further improve adaptation, we propose Adaptive Curriculum Regularization, which modifies the learning priority of samples in a probabilistic, adaptive manner based on their confidence scores and prediction stability, mitigating overfitting to noisy and over-confident samples. Extensive experiments across multiple video domain adaptation benchmarks demonstrate that Co-STAR consistently outperforms state-of-the-art SFUVDA methods. Code is available at: https://github.com/Plrbear/Co-Star

Problem

Research questions and friction points this paper is trying to address.

Improves noisy pseudo-labels in video domain adaptation

Balances confident and uncertain predictions via curriculum learning

Mitigates overfitting to noisy and over-confident samples

Innovation

Methods, ideas, or system contributions that make the work stand out.

Curriculum learning with bidirectional prediction alignment

Adaptive Curriculum Regularization for confidence-based prioritization

Collaborative self-training between teacher and CLIP model

🔎 Similar Papers

Unsupervised Video Domain Adaptation with Masked Pre-Training and Collaborative Self-Training