🤖 AI Summary
Weakly supervised video anomaly detection (WSVAD) models, relying solely on video-level labels, struggle to establish effective frame-level adversarial defenses: video-level perturbations lack sufficient intensity, while noise-induced pseudo-labels for frame-level adversarial training severely degrade performance. To address this, we propose a framework-level robustness enhancement method. Our core contributions are: (1) a spatiotemporal region distortion (SRD) mechanism that synthesizes high-fidelity, noise-free anomalous samples to enable reliable frame-level adversarial training; and (2) an integrated strategy combining pseudo-anomaly generation, spatiotemporal augmentation, and denoised pseudo-labeling to mitigate label noise inherent in weak supervision. Extensive experiments demonstrate an average AUROC improvement of 71.0% over state-of-the-art methods across multiple benchmarks. The source code is publicly available.
📝 Abstract
Weakly Supervised Video Anomaly Detection (WSVAD) has achieved notable advancements, yet existing models remain vulnerable to adversarial attacks, limiting their reliability. Due to the inherent constraints of weak supervision, where only video-level labels are provided despite the need for frame-level predictions, traditional adversarial defense mechanisms, such as adversarial training, are not effective since video-level adversarial perturbations are typically weak and inadequate. To address this limitation, pseudo-labels generated directly from the model can enable frame-level adversarial training; however, these pseudo-labels are inherently noisy, significantly degrading performance. We therefore introduce a novel Pseudo-Anomaly Generation method called Spatiotemporal Region Distortion (SRD), which creates synthetic anomalies by applying severe augmentations to localized regions in normal videos while preserving temporal consistency. Integrating these precisely annotated synthetic anomalies with the noisy pseudo-labels substantially reduces label noise, enabling effective adversarial training. Extensive experiments demonstrate that our method significantly enhances the robustness of WSVAD models against adversarial attacks, outperforming state-of-the-art methods by an average of 71.0% in overall AUROC performance across multiple benchmarks. The implementation and code are publicly available at https://github.com/rohban-lab/FrameShield.