DiffGuard: Text-Based Safety Checker for Diffusion Models

📅 2024-11-25

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the security vulnerability of open-source diffusion models (e.g., Stable Diffusion) in text-to-image generation, where malicious prompts can induce harmful content—particularly critical in high-risk misuse scenarios such as information warfare. We propose an end-to-end trainable, purely textual safety filter. Our approach is the first to systematically expose severe deficiencies in built-in ethical filters of mainstream models. It introduces a novel text encoder integrating contrastive learning with multi-granularity semantic modeling, coupled with adversarial prompt detection and fine-grained harm classification, enabling real-time, pre-generation risk interception without image synthesis or post-hoc processing. Experiments demonstrate that our method achieves over 14% higher cross-model and multilingual filtering accuracy than state-of-the-art baselines, reduces false positive rates by 22%, and supports zero-shot transfer to unseen models and emerging harm categories.

Technology Category

Application Category

📝 Abstract

Recent advances in Diffusion Models have enabled the generation of images from text, with powerful closed-source models like DALL-E and Midjourney leading the way. However, open-source alternatives, such as StabilityAI's Stable Diffusion, offer comparable capabilities. These open-source models, hosted on Hugging Face, come equipped with ethical filter protections designed to prevent the generation of explicit images. This paper reveals first their limitations and then presents a novel text-based safety filter that outperforms existing solutions. Our research is driven by the critical need to address the misuse of AI-generated content, especially in the context of information warfare. DiffGuard enhances filtering efficacy, achieving a performance that surpasses the best existing filters by over 14%.

Problem

Research questions and friction points this paper is trying to address.

Enhance safety in open-source diffusion models

Address limitations of existing ethical filters

Prevent misuse in AI-generated content

Innovation

Methods, ideas, or system contributions that make the work stand out.

Text-based safety filter

Enhances filtering efficacy

Surpasses existing filters performance

🔎 Similar Papers

No similar papers found.

Authors to Follow