Co-STAR: Collaborative Curriculum Self-Training with Adaptive Regularization for Source-Free Video Domain Adaptation

📅 2025-04-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In source-free unsupervised video domain adaptation (SFUVDA), high pseudo-label noise and model overconfidence severely degrade cross-domain generalization when the source domain is unavailable. To address this, we propose a CLIP-augmented curriculum self-training framework guided by a teacher model. Our method jointly leverages collaborative self-training, contrastive vision-language priors from CLIP, and reliability-driven pseudo-label refinement. Key contributions include: (1) a novel bidirectional prediction alignment mechanism with reliability-weighted curriculum learning to improve pseudo-label quality; and (2) an adaptive curriculum regularization that dynamically suppresses both label noise and overconfidence bias by jointly measuring prediction confidence and stability. Extensive experiments on multiple video domain adaptation benchmarks demonstrate that our approach significantly outperforms existing source-free methods, achieving more robust and accurate cross-domain video recognition.

Technology Category

Application Category

📝 Abstract
Recent advances in Source-Free Unsupervised Video Domain Adaptation (SFUVDA) leverage vision-language models to enhance pseudo-label generation. However, challenges such as noisy pseudo-labels and over-confident predictions limit their effectiveness in adapting well across domains. We propose Co-STAR, a novel framework that integrates curriculum learning with collaborative self-training between a source-trained teacher and a contrastive vision-language model (CLIP). Our curriculum learning approach employs a reliability-based weight function that measures bidirectional prediction alignment between the teacher and CLIP, balancing between confident and uncertain predictions. This function preserves uncertainty for difficult samples, while prioritizing reliable pseudo-labels when the predictions from both models closely align. To further improve adaptation, we propose Adaptive Curriculum Regularization, which modifies the learning priority of samples in a probabilistic, adaptive manner based on their confidence scores and prediction stability, mitigating overfitting to noisy and over-confident samples. Extensive experiments across multiple video domain adaptation benchmarks demonstrate that Co-STAR consistently outperforms state-of-the-art SFUVDA methods. Code is available at: https://github.com/Plrbear/Co-Star
Problem

Research questions and friction points this paper is trying to address.

Improves noisy pseudo-labels in video domain adaptation
Balances confident and uncertain predictions via curriculum learning
Mitigates overfitting to noisy and over-confident samples
Innovation

Methods, ideas, or system contributions that make the work stand out.

Curriculum learning with bidirectional prediction alignment
Adaptive Curriculum Regularization for confidence-based prioritization
Collaborative self-training between teacher and CLIP model
🔎 Similar Papers
No similar papers found.