SORA: Free Second-Order Attacks in Fast Adversarial Training

📅 2026-05-30

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This work addresses the severe drop in robustness against multi-step attacks caused by catastrophic overfitting in single-step fast adversarial training. To mitigate ε-overfitting, the authors propose an adaptive adversarial training method that introduces perturbation variability and dynamically adjusts the perturbation step size based on the geometry of the loss landscape. Key contributions include a novel perspective on ε-overfitting, a lightweight perturbation alignment metric termed PertAlign, and SORA—a strategy for leveraging second-order information without additional computational overhead. Evaluated across diverse datasets and model architectures, the method achieves state-of-the-art robustness and clean accuracy using a single hyperparameter setting, significantly outperforming existing fast adversarial training approaches.

📝 Abstract

Adversarial Training (AT) is a leading defense against adversarial examples but often suffers from Catastrophic Overfitting (CO) in efficient single-step variants, where robustness to multi-step attacks collapses despite high single-step performance. We address this failure mode with two contributions. First, we formalize Epsilon Overfitting (EO), a perspective in which fixed perturbation magnitudes and directions exacerbate CO, and show that introducing perturbation variability significantly improves robust generalization across different architectures and datasets. Second, we propose PertAlign (Perturbation Alignment), a theoretically grounded, computationally negligible metric that predicts CO onset by measuring gradient alignment across attack stages. Leveraging these insights, we introduce SORA, an adaptive step-size AT method that dynamically adjusts perturbations based on loss surface geometry. SORA consistently prevents CO, achieves state-of-the-art robustness and clean accuracy, and generalizes across datasets and architectures using a single fixed set of hyperparameters, which is essential for applicability in fast AT. Extensive experiments on diverse datasets and architectures show that SORA matches or surpasses the robustness of prior methods while delivering higher clean accuracy and superior efficiency. Code is available at https://github.com/SecondOrderAT/SORA.

Problem

Research questions and friction points this paper is trying to address.

Catastrophic Overfitting

Adversarial Training

Robustness

Single-step Attacks

Multi-step Attacks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Catastrophic Overfitting

Epsilon Overfitting

Perturbation Alignment