FrameShield: Adversarially Robust Video Anomaly Detection

📅 2025-10-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Weakly supervised video anomaly detection (WSVAD) models, relying solely on video-level labels, struggle to establish effective frame-level adversarial defenses: video-level perturbations lack sufficient intensity, while noise-induced pseudo-labels for frame-level adversarial training severely degrade performance. To address this, we propose a framework-level robustness enhancement method. Our core contributions are: (1) a spatiotemporal region distortion (SRD) mechanism that synthesizes high-fidelity, noise-free anomalous samples to enable reliable frame-level adversarial training; and (2) an integrated strategy combining pseudo-anomaly generation, spatiotemporal augmentation, and denoised pseudo-labeling to mitigate label noise inherent in weak supervision. Extensive experiments demonstrate an average AUROC improvement of 71.0% over state-of-the-art methods across multiple benchmarks. The source code is publicly available.

Technology Category

Application Category

📝 Abstract
Weakly Supervised Video Anomaly Detection (WSVAD) has achieved notable advancements, yet existing models remain vulnerable to adversarial attacks, limiting their reliability. Due to the inherent constraints of weak supervision, where only video-level labels are provided despite the need for frame-level predictions, traditional adversarial defense mechanisms, such as adversarial training, are not effective since video-level adversarial perturbations are typically weak and inadequate. To address this limitation, pseudo-labels generated directly from the model can enable frame-level adversarial training; however, these pseudo-labels are inherently noisy, significantly degrading performance. We therefore introduce a novel Pseudo-Anomaly Generation method called Spatiotemporal Region Distortion (SRD), which creates synthetic anomalies by applying severe augmentations to localized regions in normal videos while preserving temporal consistency. Integrating these precisely annotated synthetic anomalies with the noisy pseudo-labels substantially reduces label noise, enabling effective adversarial training. Extensive experiments demonstrate that our method significantly enhances the robustness of WSVAD models against adversarial attacks, outperforming state-of-the-art methods by an average of 71.0% in overall AUROC performance across multiple benchmarks. The implementation and code are publicly available at https://github.com/rohban-lab/FrameShield.
Problem

Research questions and friction points this paper is trying to address.

Enhancing adversarial robustness in weakly supervised video anomaly detection
Addressing ineffective video-level adversarial training with weak supervision
Mitigating noisy pseudo-labels through synthetic anomaly generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates pseudo-anomalies via spatiotemporal region distortion
Integrates synthetic anomalies with noisy pseudo-labels
Enables effective adversarial training for video anomaly detection
🔎 Similar Papers
No similar papers found.
Mojtaba Nafez
Mojtaba Nafez
Master's Student, Department of Computer Engineering, Sharif University of Technology
Machine Learning
M
Mobina Poulaei
Department of Computer Engineering, Sharif University of Technology
N
Nikan Vasei
Department of Computer Engineering, Sharif University of Technology
B
Bardia Soltani Moakhar
Department of Industrial Engineering, Sharif University of Technology
Mohammad Sabokrou
Mohammad Sabokrou
Okinawa Institute of Science and Technology
Machine LearningComputer VisionTrustworthy AI
M
MohammadHossein Rohban
Department of Computer Engineering, Sharif University of Technology