Scaling Laws for Black box Adversarial Attacks

📅 2024-11-25
🏛️ arXiv.org
📈 Citations: 2
Influential: 1
📄 PDF
🤖 AI Summary
In commercial black-box settings, cross-model transferability of adversarial attacks is often limited due to architectural and training disparities. Method: This work investigates the scaling relationship between the number of surrogate models and transfer success rate in black-box adversarial attacks, proposing a unified multi-model ensemble-based transfer attack framework compatible with diverse methods—including PGD and MI-FGSM—and evaluating it across heterogeneous architectures: CNNs, robustly trained models, and closed-source multimodal large language models (e.g., GPT-4o). Contribution/Results: We empirically discover and validate a clear positive scaling law: increasing ensemble size significantly improves transfer success rates—exceeding 90% on both standard models and GPT-4o—while simultaneously enhancing the semantic fidelity and interpretability of generated adversarial perturbations. This study establishes the first theoretical and practical foundation for scalable, high-transfer black-box attacks.

Technology Category

Application Category

📝 Abstract
Adversarial examples usually exhibit good cross-model transferability, enabling attacks on black-box models with limited information about their architectures and parameters, which are highly threatening in commercial black-box scenarios. Model ensembling is an effective strategy to improve the transferability of adversarial examples by attacking multiple surrogate models. However, since prior studies usually adopt few models in the ensemble, there remains an open question of whether scaling the number of models can further improve black-box attacks. Inspired by the scaling law of large foundation models, we investigate the scaling laws of black-box adversarial attacks in this work. Through theoretical analysis and empirical evaluations, we conclude with clear scaling laws that using more surrogate models enhances adversarial transferability. Comprehensive experiments verify the claims on standard image classifiers, diverse defended models and multimodal large language models using various adversarial attack methods. Specifically, by scaling law, we achieve 90%+ transfer attack success rate on even proprietary models like GPT-4o. Further visualization indicates that there is also a scaling law on the interpretability and semantics of adversarial perturbations.
Problem

Research questions and friction points this paper is trying to address.

Investigates scaling laws for black-box adversarial attacks
Examines if more surrogate models improve transferability
Tests scaling effects on interpretability of perturbations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Scaling model ensembles boosts adversarial transferability
Theoretical and empirical validation of scaling laws
Achieves 90%+ success rate on proprietary models
🔎 Similar Papers
No similar papers found.
Chuan Liu
Chuan Liu
University of Rochester
Huanran Chen
Huanran Chen
PhD student, Tsinghua SAIL
Machine Learning TheoryOptimizationAI Safety
Y
Yichi Zhang
Dept. of Comp. Sci. and Tech., Institute for AI, Tsinghua-Bosch Joint ML Center, THBI Lab, BNRist Center, Tsinghua University, Beijing, 100084, China; RealAI
Yinpeng Dong
Yinpeng Dong
Tsinghua University
Machine LearningDeep LearningAI Safety
J
Jun Zhu
Dept. of Comp. Sci. and Tech., Institute for AI, Tsinghua-Bosch Joint ML Center, THBI Lab, BNRist Center, Tsinghua University, Beijing, 100084, China; RealAI