Concept Pinpoint Eraser for Text-to-image Diffusion Models via Residual Attention Gate

📅 2025-06-28

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of precisely erasing specific inappropriate or copyright-protected concepts—such as celebrities, sensitive content, or trademarks—from text-to-image diffusion models. To avoid costly full-model fine-tuning, we propose a lightweight concept erasure method operating locally within cross-attention layers. Our approach introduces a nonlinear residual attention gate (ResAG) and an attention anchoring loss, jointly optimized with learnable text embeddings via adversarial iterative refinement to selectively suppress target concepts. Crucially, it preserves semantic integrity and generative diversity of non-target concepts while mitigating catastrophic forgetting. Experiments demonstrate significant improvements over state-of-the-art methods on erasure tasks involving celebrities, artistic styles, and sensitive content—achieving superior erasure completeness, high distribution fidelity, and robustness across diverse prompts and model variants.

Technology Category

Application Category

📝 Abstract

Remarkable progress in text-to-image diffusion models has brought a major concern about potentially generating images on inappropriate or trademarked concepts. Concept erasing has been investigated with the goals of deleting target concepts in diffusion models while preserving other concepts with minimal distortion. To achieve these goals, recent concept erasing methods usually fine-tune the cross-attention layers of diffusion models. In this work, we first show that merely updating the cross-attention layers in diffusion models, which is mathematically equivalent to adding emph{linear} modules to weights, may not be able to preserve diverse remaining concepts. Then, we propose a novel framework, dubbed Concept Pinpoint Eraser (CPE), by adding emph{nonlinear} Residual Attention Gates (ResAGs) that selectively erase (or cut) target concepts while safeguarding remaining concepts from broad distributions by employing an attention anchoring loss to prevent the forgetting. Moreover, we adversarially train CPE with ResAG and learnable text embeddings in an iterative manner to maximize erasing performance and enhance robustness against adversarial attacks. Extensive experiments on the erasure of celebrities, artistic styles, and explicit contents demonstrated that the proposed CPE outperforms prior arts by keeping diverse remaining concepts while deleting the target concepts with robustness against attack prompts. Code is available at https://github.com/Hyun1A/CPE

Problem

Research questions and friction points this paper is trying to address.

Erase target concepts in diffusion models without distorting others

Improve concept preservation via nonlinear Residual Attention Gates

Enhance robustness against adversarial attacks during concept erasure

Innovation

Methods, ideas, or system contributions that make the work stand out.

Nonlinear Residual Attention Gates for selective erasing

Attention anchoring loss safeguards remaining concepts

Adversarial training enhances robustness and erasing performance

🔎 Similar Papers

Hiding and Recovering Knowledge in Text-to-Image Diffusion Models via Learnable Prompts

2024-03-18Citations: 0

SPDiffusion: Semantic Protection Diffusion Models for Multi-concept Text-to-image Generation

2024-09-02Citations: 0

Authors to Follow