Exploiting the Layered Intrinsic Dimensionality of Deep Models for Practical Adversarial Training

📅 2024-05-27

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work investigates the intrinsic mechanism underlying the robustness-generalization trade-off disparity in vision-language models under adversarial training, revealing that it stems from heterogeneous hierarchical distributions of intrinsic dimensionality across intermediate-layer manifolds in distinct architectures (e.g., ViT, encoders, decoders). To address this, we propose Stratified Manifold Adaptive Adversarial Training (SMAAT): leveraging layer-wise intrinsic dimension estimation, SMAAT identifies and perturbs the manifold layer with minimal intrinsic dimensionality to efficiently generate out-of-manifold adversarial examples. SMAAT is the first method enabling decoupled optimization of robustness and generalization, and is architecture-agnostic—applicable to both ViT and encoder-decoder models. Experiments demonstrate that SMAAT reduces GPU training overhead to 25–33% of standard adversarial training, while significantly improving robustness across sentiment classification, safety filtering, and RAG retrieval tasks—without compromising generalization performance.

Technology Category

Application Category

📝 Abstract

Despite being a heavily researched topic, Adversarial Training (AT) is rarely, if ever, deployed in practical AI systems for two primary reasons: (i) the gained robustness is frequently accompanied by a drop in generalization and (ii) generating adversarial examples (AEs) is computationally prohibitively expensive. To address these limitations, we propose SMAAT, a new AT algorithm that leverages the manifold conjecture, stating that off-manifold AEs lead to better robustness while on-manifold AEs result in better generalization. Specifically, SMAAT aims at generating a higher proportion of off-manifold AEs by perturbing the intermediate deepnet layer with the lowest intrinsic dimension. This systematically results in better scalability compared to classical AT as it reduces the PGD chains length required for generating the AEs. Additionally, our study provides, to the best of our knowledge, the first explanation for the difference in the generalization and robustness trends between vision and language models, ie., AT results in a drop in generalization in vision models whereas, in encoder-based language models, generalization either improves or remains unchanged. We show that vision transformers and decoder-based models tend to have low intrinsic dimensionality in the earlier layers of the network (more off-manifold AEs), while encoder-based models have low intrinsic dimensionality in the later layers. We demonstrate the efficacy of SMAAT; on several tasks, including robustifying (i) sentiment classifiers, (ii) safety filters in decoder-based models, and (iii) retrievers in RAG setups. SMAAT requires only 25-33% of the GPU time compared to standard AT, while significantly improving robustness across all applications and maintaining comparable generalization.

Problem

Research questions and friction points this paper is trying to address.

Explains how Intrinsic Dimensionality influences Adversarial Training outcomes

Proposes SMAAT to enhance AT scalability for encoder-based models

Validates SMAAT across tasks for improved robustness and efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages manifold conjecture for adversarial training

Introduces SMAAT for scalable encoder-based AT

Reduces PGD chain length, cuts GPU time

🔎 Similar Papers

No similar papers found.

Authors to Follow