Exploiting the Layered Intrinsic Dimensionality of Deep Models for Practical Adversarial Training

📅 2024-05-27
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the intrinsic mechanism underlying the robustness-generalization trade-off disparity in vision-language models under adversarial training, revealing that it stems from heterogeneous hierarchical distributions of intrinsic dimensionality across intermediate-layer manifolds in distinct architectures (e.g., ViT, encoders, decoders). To address this, we propose Stratified Manifold Adaptive Adversarial Training (SMAAT): leveraging layer-wise intrinsic dimension estimation, SMAAT identifies and perturbs the manifold layer with minimal intrinsic dimensionality to efficiently generate out-of-manifold adversarial examples. SMAAT is the first method enabling decoupled optimization of robustness and generalization, and is architecture-agnostic—applicable to both ViT and encoder-decoder models. Experiments demonstrate that SMAAT reduces GPU training overhead to 25–33% of standard adversarial training, while significantly improving robustness across sentiment classification, safety filtering, and RAG retrieval tasks—without compromising generalization performance.

Technology Category

Application Category

📝 Abstract
Despite being a heavily researched topic, Adversarial Training (AT) is rarely, if ever, deployed in practical AI systems for two primary reasons: (i) the gained robustness is frequently accompanied by a drop in generalization and (ii) generating adversarial examples (AEs) is computationally prohibitively expensive. To address these limitations, we propose SMAAT, a new AT algorithm that leverages the manifold conjecture, stating that off-manifold AEs lead to better robustness while on-manifold AEs result in better generalization. Specifically, SMAAT aims at generating a higher proportion of off-manifold AEs by perturbing the intermediate deepnet layer with the lowest intrinsic dimension. This systematically results in better scalability compared to classical AT as it reduces the PGD chains length required for generating the AEs. Additionally, our study provides, to the best of our knowledge, the first explanation for the difference in the generalization and robustness trends between vision and language models, ie., AT results in a drop in generalization in vision models whereas, in encoder-based language models, generalization either improves or remains unchanged. We show that vision transformers and decoder-based models tend to have low intrinsic dimensionality in the earlier layers of the network (more off-manifold AEs), while encoder-based models have low intrinsic dimensionality in the later layers. We demonstrate the efficacy of SMAAT; on several tasks, including robustifying (i) sentiment classifiers, (ii) safety filters in decoder-based models, and (iii) retrievers in RAG setups. SMAAT requires only 25-33% of the GPU time compared to standard AT, while significantly improving robustness across all applications and maintaining comparable generalization.
Problem

Research questions and friction points this paper is trying to address.

Explains how Intrinsic Dimensionality influences Adversarial Training outcomes
Proposes SMAAT to enhance AT scalability for encoder-based models
Validates SMAAT across tasks for improved robustness and efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages manifold conjecture for adversarial training
Introduces SMAAT for scalable encoder-based AT
Reduces PGD chain length, cuts GPU time
🔎 Similar Papers
No similar papers found.
E
Enes Altinisik
Qatar Computing Research Institute, HBKU Research Complex, Doha, Qatar
Safa Messaoud
Safa Messaoud
Scientist, Qatar Computing Research Institute (QCRI)
Safe AIEnergy Based ModelsReinforcement LearningComputer VisionHealth intelligence
H
H. Sencar
Qatar Computing Research Institute, HBKU Research Complex, Doha, Qatar
Hassan Sajjad
Hassan Sajjad
Faculty of Computer Science, Dalhousie University
Deep LearningNLPInterpretabilityExplainableAIUnsupervised and semi-supervised methods
S
Sanjay Chawla
Qatar Computing Research Institute, HBKU Research Complex, Doha, Qatar