Rethinking Invariance Regularization in Adversarial Training to Improve Robustness-Accuracy Trade-off

📅 2024-02-22
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In adversarial training, the trade-off between robustness and clean accuracy is hindered by gradient conflicts induced by invariance regularization and distributional entanglement between clean and adversarial samples. To address this, we propose Asymmetric Representation Regularization Adversarial Training (ARAT), which introduces— for the first time—a stop-gradient-augmented asymmetric invariance loss and a decoupled BatchNorm mechanism to separately model clean and adversarial sample statistics. ARAT jointly optimizes two complementary pathways: gradient dynamics control and distribution-aware representation learning, thereby alleviating objective conflicts. Extensive experiments on CIFAR-10/100 and ImageNet under diverse adversarial attacks demonstrate that ARAT achieves superior robust-clean accuracy trade-offs, outperforming state-of-the-art methods. Moreover, ARAT provides a novel, interpretable theoretical perspective for knowledge distillation–based defenses, linking representation asymmetry to generalization and robustness.

Technology Category

Application Category

📝 Abstract
Adversarial training often suffers from a robustness-accuracy trade-off, where achieving high robustness comes at the cost of accuracy. One approach to mitigate this trade-off is leveraging invariance regularization, which encourages model invariance under adversarial perturbations; however, it still leads to accuracy loss. In this work, we closely analyze the challenges of using invariance regularization in adversarial training and understand how to address them. Our analysis identifies two key issues: (1) a ``gradient conflict"between invariance and classification objectives, leading to suboptimal convergence, and (2) the mixture distribution problem arising from diverged distributions between clean and adversarial inputs. To address these issues, we propose Asymmetric Representation-regularized Adversarial Training (ARAT), which incorporates asymmetric invariance loss with stop-gradient operation and a predictor to avoid gradient conflict, and a split-BatchNorm (BN) structure to resolve the mixture distribution problem. Our detailed analysis demonstrates that each component effectively addresses the identified issues, offering novel insights into adversarial defense. ARAT shows superiority over existing methods across various settings. Finally, we discuss the implications of our findings to knowledge distillation-based defenses, providing a new perspective on their relative successes.
Problem

Research questions and friction points this paper is trying to address.

Adversarial Training
Robustness
Accuracy Degradation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Asymmetric Invariance Loss
Special Batch Normalization
Adversarial Training Enhancement
🔎 Similar Papers
No similar papers found.