Hybrid Robustness Verification for Spatio-Temporal Neural Networks

📅 2026-06-08

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing neural network robustness verification methods struggle to efficiently handle spatiotemporal data such as videos, primarily because they overlook the structured nature of adversarial perturbations. This work proposes the first hybrid verification framework tailored for 3D CNNs that explicitly models structured spatiotemporal perturbations: it employs an exact closed-form solution for the initial convolutional layer and combines efficient approximations for subsequent layers, augmented with a novel Spatio-Temporal Bound Propagation (STBP) technique to enable scalable certification. The authors also introduce ST-Bench, the first benchmark for evaluating spatiotemporal robustness in autonomous driving and action recognition tasks. Experiments demonstrate that, under identical perturbation budgets, the proposed method achieves a 1.7× improvement in certified robust accuracy over current state-of-the-art approaches, substantially enhancing verifiable safety on benchmarks including UCF-101, Udacity, and MedMNIST.

📝 Abstract

With AI increasingly deployed in safety-critical systems, providing formal robustness guarantees for the underlying models is essential. Existing verification methods either rely on overly conservative approximations or incur prohibitive computational costs. For example, the use of lp-norm perturbations in video settings encodes the belief that the adversary can inject noise in every video frame. In practice, adversarial perturbations exhibit structured spatial and temporal correlations, constrained to lower-dimensional, semantically meaningful subspaces. In this work, we study robustness verification of 3D CNNs processing video and volumetric inputs, targeting applications in action recognition (UCF-101), autonomous driving (Udacity), and medical imaging (MedMNIST) exploiting realistic assumptions on adversarial strength by modelling them as spatio-temporal constraints - where the attacker can modify either a subset of frames or patches within a set of consecutive frames. We demonstrate that modelling realistic constraints enables tighter approximations. We introduce Spatio-Temporal Bound Propagation (STBP), a verification framework that computes an exact closed-form characterization of the first convolutional layer and propagates certified bounds through subsequent layers using scalable approximations. Computing the exact closed form provides the tightest bounds for the first convolutional layer. Thus, we utilise approximation methods in the remainder of the network. To spur further progress in this field, we propose ST-Bench, a verification benchmark for autonomous driving and activity recognition, to systematically evaluate verifiable robustness. Compared to existing verification-based approaches, STBP provides stronger robustness guarantees with significantly improved scalability, achieving 1.7x higher certified robust accuracy under identical perturbation budgets.

Problem

Research questions and friction points this paper is trying to address.

robustness verification

spatio-temporal neural networks

adversarial perturbations

3D CNNs

formal guarantees

Innovation

Methods, ideas, or system contributions that make the work stand out.

spatio-temporal robustness

3D CNN verification

bound propagation