Practical, Generalizable and Robust Backdoor Attacks on Text-to-Image Diffusion Models

📅 2025-08-03

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing backdoor attacks against text-to-image diffusion models rely on unnatural prompts, require large-scale poisoned datasets, suffer from poor generalizability, and are vulnerable to defenses. This paper proposes the first model-agnostic, universal backdoor attack framework for diffusion models. By integrating a stealthy trigger mechanism with lightweight adversarial optimization, our method achieves over 90% attack success rates across mainstream models—including Stable Diffusion and SDXL—using only ten poisoned samples. Crucially, it preserves normal-generation fidelity while ensuring strong stealthiness and robustness against state-of-the-art detection and adaptive defenses; human perceptual evaluation further confirms its effectiveness. The core innovation lies in the first demonstration of a unified solution achieving simultaneously high attack success rate, broad cross-model generalizability, minimal data dependency (i.e., ultra-low-shot poisoning), and strong concealment—thereby overcoming fundamental limitations of prior work.

Technology Category

Application Category

📝 Abstract

Text-to-image diffusion models (T2I DMs) have achieved remarkable success in generating high-quality and diverse images from text prompts, yet recent studies have revealed their vulnerability to backdoor attacks. Existing attack methods suffer from critical limitations: 1) they rely on unnatural adversarial prompts that lack human readability and require massive poisoned data; 2) their effectiveness is typically restricted to specific models, lacking generalizability; and 3) they can be mitigated by recent backdoor defenses. To overcome these challenges, we propose a novel backdoor attack framework that achieves three key properties: 1) emph{Practicality}: Our attack requires only a few stealthy backdoor samples to generate arbitrary attacker-chosen target images, as well as ensuring high-quality image generation in benign scenarios. 2) emph{Generalizability:} The attack is applicable across multiple T2I DMs without requiring model-specific redesign. 3) emph{Robustness:} The attack remains effective against existing backdoor defenses and adaptive defenses. Our extensive experimental results on multiple T2I DMs demonstrate that with only 10 carefully crafted backdoored samples, our attack method achieves $>$90% attack success rate with negligible degradation in benign image generation quality. We also conduct human evaluation to validate our attack effectiveness. Furthermore, recent backdoor detection and mitigation methods, as well as adaptive defense tailored to our attack are not sufficiently effective, highlighting the pressing need for more robust defense mechanisms against the proposed attack.

Problem

Research questions and friction points this paper is trying to address.

Existing backdoor attacks lack practicality and human readability

Current methods are not generalizable across different diffusion models

Existing attacks are vulnerable to modern backdoor defenses

Innovation

Methods, ideas, or system contributions that make the work stand out.

Few stealthy samples for high attack success

Applicable across multiple models without redesign

Effective against existing and adaptive defenses

🔎 Similar Papers

No similar papers found.

Authors to Follow