Adversarial Augmentation Training Makes Action Recognition Models More Robust to Realistic Video Distribution Shifts

📅 2024-01-21

🏛️ International Conferences on Pattern Recognition and Artificial Intelligence

📈 Citations: 2

✨ Influential: 0

🤖 AI Summary

Video action recognition models suffer from insufficient robustness under real-world training–test distribution shifts. To address this, we propose two unsupervised distribution robustness evaluation methods and introduce a gradient-driven adversarial video augmentation framework. First, we devise a novel class-prototype-based zero-shot evaluation metric that quantifies model robustness without requiring target-domain labels. Second, we design a gradient-ascent-optimized adversarial augmentation strategy coupled with a curriculum scheduling mechanism, jointly enhancing multiple architectures—including TSM, Video Swin, and UniFormer. Evaluated on cross-domain transfer tasks (HMDB-51/UCF-101 → Kinetics), our approach significantly improves distributional robustness, consistently outperforming existing baselines. This work establishes a new paradigm for unsupervised distribution robustness assessment and enhancement in video action recognition.

Technology Category

Application Category

📝 Abstract

Despite recent advances in video action recognition achieving strong performance on existing benchmarks, these models often lack robustness when faced with natural distribution shifts between training and test data. We propose two novel evaluation methods to assess model resilience to such distribution disparity. One method uses two different datasets collected from different sources and uses one for training and validation, and the other for testing. More precisely, we created dataset splits of HMDB-51 or UCF-101 for training, and Kinetics-400 for testing, using the subset of the classes that are overlapping in both train and test datasets. The other proposed method extracts the feature mean of each class from the target evaluation dataset's training data (i.e. class prototype) and estimates test video prediction as a cosine similarity score between each sample to the class prototypes of each target class. This procedure does not alter model weights using the target dataset and it does not require aligning overlapping classes of two different datasets, thus is a very efficient method to test the model robustness to distribution shifts without prior knowledge of the target distribution. We address the robustness problem by adversarial augmentation training - generating augmented views of videos that are"hard"for the classification model by applying gradient ascent on the augmentation parameters - as well as"curriculum"scheduling the strength of the video augmentations. We experimentally demonstrate the superior performance of the proposed adversarial augmentation approach over baselines across three state-of-the-art action recognition models - TSM, Video Swin Transformer, and Uniformer. The presented work provides critical insight into model robustness to distribution shifts and presents effective techniques to enhance video action recognition performance in a real-world deployment.

Problem

Research questions and friction points this paper is trying to address.

Assessing model resilience to video distribution shifts

Proposing adversarial augmentation training for robustness

Enhancing action recognition in real-world deployment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversarial augmentation training enhances model robustness

Class prototype similarity for efficient distribution shift testing

Curriculum scheduling optimizes video augmentation strength

🔎 Similar Papers

No similar papers found.

Authors to Follow