DiffGuard: Text-Based Safety Checker for Diffusion Models

📅 2024-11-25
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the security vulnerability of open-source diffusion models (e.g., Stable Diffusion) in text-to-image generation, where malicious prompts can induce harmful content—particularly critical in high-risk misuse scenarios such as information warfare. We propose an end-to-end trainable, purely textual safety filter. Our approach is the first to systematically expose severe deficiencies in built-in ethical filters of mainstream models. It introduces a novel text encoder integrating contrastive learning with multi-granularity semantic modeling, coupled with adversarial prompt detection and fine-grained harm classification, enabling real-time, pre-generation risk interception without image synthesis or post-hoc processing. Experiments demonstrate that our method achieves over 14% higher cross-model and multilingual filtering accuracy than state-of-the-art baselines, reduces false positive rates by 22%, and supports zero-shot transfer to unseen models and emerging harm categories.

Technology Category

Application Category

📝 Abstract
Recent advances in Diffusion Models have enabled the generation of images from text, with powerful closed-source models like DALL-E and Midjourney leading the way. However, open-source alternatives, such as StabilityAI's Stable Diffusion, offer comparable capabilities. These open-source models, hosted on Hugging Face, come equipped with ethical filter protections designed to prevent the generation of explicit images. This paper reveals first their limitations and then presents a novel text-based safety filter that outperforms existing solutions. Our research is driven by the critical need to address the misuse of AI-generated content, especially in the context of information warfare. DiffGuard enhances filtering efficacy, achieving a performance that surpasses the best existing filters by over 14%.
Problem

Research questions and friction points this paper is trying to address.

Enhance safety in open-source diffusion models
Address limitations of existing ethical filters
Prevent misuse in AI-generated content
Innovation

Methods, ideas, or system contributions that make the work stand out.

Text-based safety filter
Enhances filtering efficacy
Surpasses existing filters performance
🔎 Similar Papers
No similar papers found.
M
Massine El Khader
Université Paris-Saclay, CentraleSupélec
E
Elias Al Bouzidi
Université Paris-Saclay, CentraleSupélec
A
Abdellah Oumida
Université Paris-Saclay, CentraleSupélec
M
Mohammed Sbaihi
Université Paris-Saclay, CentraleSupélec
E
Eliott Binard
Université Paris-Saclay, CentraleSupélec
J
Jean-Philippe Poli
Université Paris-Saclay, CentraleSupélec
W
Wassila Ouerdane
Université Paris-Saclay, CentraleSupélec
B
B. Addad
Thales SIX GTS France
Katarzyna Kapusta
Katarzyna Kapusta
Thales ThereSIS
AI securityprivacy-preserving MLdata protection