๐ค AI Summary
Addressing three key challenges in weakly supervised temporal action localization (WTAL)โlow-quality pseudo-labels, insufficient exploitation of multi-granularity priors, and difficulty in training with noisy labelsโthis paper proposes PseudoFormer, a dual-branch framework. Our method introduces: (1) RickerFusion, a novel global feature mapping mechanism that enables high-fidelity cross-scale pseudo-label generation; (2) joint modeling of snippet-level and proposal-level weak supervision, establishing dual-path supervision for complementary granularity-aware learning; and (3) an uncertainty-aware masking strategy coupled with iterative pseudo-label refinement and an uncertainty-weighted loss to enhance robustness against label noise. Evaluated on THUMOS14 and ActivityNet 1.3, PseudoFormer achieves state-of-the-art performance, significantly narrowing the gap between weakly and fully supervised approaches. Ablation studies comprehensively validate the effectiveness of each component.
๐ Abstract
Weakly-supervised Temporal Action Localization (WTAL) has achieved notable success but still suffers from a lack of temporal annotations, leading to a performance and framework gap compared with fully-supervised methods. While recent approaches employ pseudo labels for training, three key challenges: generating high-quality pseudo labels, making full use of different priors, and optimizing training methods with noisy labels remain unresolved. Due to these perspectives, we propose PseudoFormer, a novel two-branch framework that bridges the gap between weakly and fully-supervised Temporal Action Localization (TAL). We first introduce RickerFusion, which maps all predicted action proposals to a global shared space to generate pseudo labels with better quality. Subsequently, we leverage both snippet-level and proposal-level labels with different priors from the weak branch to train the regression-based model in the full branch. Finally, the uncertainty mask and iterative refinement mechanism are applied for training with noisy pseudo labels. PseudoFormer achieves state-of-the-art WTAL results on the two commonly used benchmarks, THUMOS14 and ActivityNet1.3. Besides, extensive ablation studies demonstrate the contribution of each component of our method.