DiffBreak: Breaking Diffusion-Based Purification with Adaptive Attacks

📅 2024-11-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper reveals the intrinsic vulnerability of Diffusion-Based Purification (DBP) under adaptive gradient-based attacks, where adversaries exploit backpropagation to directly manipulate the diffusion process, steering purified outputs into adversarial distributions and thereby nullifying robustness. Method: We first provide the first theoretical proof that DBP is fundamentally breakable via targeted attacks; develop DiffBreak—a dedicated gradient library enabling efficient gradient computation through diffusion models; and propose a systematic perturbation construction method inspired by deepfake watermarking, achieving perfect attack success even under strong robustness settings (e.g., majority voting). Contribution/Results: Experiments demonstrate that our attack significantly degrades DBP accuracy under both standard and stringent evaluation protocols, exposing critical security flaws in existing DBP implementations’ backpropagation mechanisms. Our work establishes a foundational security boundary for trustworthy diffusion-based defenses.

Technology Category

Application Category

📝 Abstract
Diffusion-based purification (DBP) has emerged as a cornerstone defense against adversarial examples (AEs), widely regarded as robust due to its use of diffusion models (DMs) that project AEs onto the natural data distribution. However, contrary to prior assumptions, we theoretically prove that adaptive gradient-based attacks nullify this foundational claim, effectively targeting the DM rather than the classifier and causing purified outputs to align with adversarial distributions. This surprising discovery prompts a reassessment of DBP's robustness, revealing it stems from critical flaws in backpropagation techniques used so far for attacking DBP. To address these gaps, we introduce DiffBreak, a novel and reliable gradient library for DBP, which exposes how adaptive attacks drastically degrade its robustness. In stricter majority-vote settings, where classifier decisions aggregate predictions over multiple purified inputs, DBP retains partial robustness to traditional norm-bounded AEs due to its stochasticity disrupting adversarial alignment. However, we propose a novel adaptation of a recent optimization method against deepfake watermarking, crafting systemic adversarial perturbations that defeat DBP even under these conditions, ultimately challenging its viability as a defense without improvements.
Problem

Research questions and friction points this paper is trying to address.

Adaptive attacks disrupt diffusion-based purification robustness.
DiffBreak exposes flaws in current backpropagation techniques.
Systemic perturbations defeat DBP in majority-vote settings.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces DiffBreak gradient library
Adapts optimization for deepfake watermarking
Systemic perturbations defeat diffusion purification
A
Andre Kassis
Cheriton School of Computer Science, University of Waterloo, Canada
Urs Hengartner
Urs Hengartner
University of Waterloo
SecurityPrivacy
Yaoliang Yu
Yaoliang Yu
University of Waterloo
Machine learningOptimization