Concept Pinpoint Eraser for Text-to-image Diffusion Models via Residual Attention Gate

📅 2025-06-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of precisely erasing specific inappropriate or copyright-protected concepts—such as celebrities, sensitive content, or trademarks—from text-to-image diffusion models. To avoid costly full-model fine-tuning, we propose a lightweight concept erasure method operating locally within cross-attention layers. Our approach introduces a nonlinear residual attention gate (ResAG) and an attention anchoring loss, jointly optimized with learnable text embeddings via adversarial iterative refinement to selectively suppress target concepts. Crucially, it preserves semantic integrity and generative diversity of non-target concepts while mitigating catastrophic forgetting. Experiments demonstrate significant improvements over state-of-the-art methods on erasure tasks involving celebrities, artistic styles, and sensitive content—achieving superior erasure completeness, high distribution fidelity, and robustness across diverse prompts and model variants.

Technology Category

Application Category

📝 Abstract
Remarkable progress in text-to-image diffusion models has brought a major concern about potentially generating images on inappropriate or trademarked concepts. Concept erasing has been investigated with the goals of deleting target concepts in diffusion models while preserving other concepts with minimal distortion. To achieve these goals, recent concept erasing methods usually fine-tune the cross-attention layers of diffusion models. In this work, we first show that merely updating the cross-attention layers in diffusion models, which is mathematically equivalent to adding emph{linear} modules to weights, may not be able to preserve diverse remaining concepts. Then, we propose a novel framework, dubbed Concept Pinpoint Eraser (CPE), by adding emph{nonlinear} Residual Attention Gates (ResAGs) that selectively erase (or cut) target concepts while safeguarding remaining concepts from broad distributions by employing an attention anchoring loss to prevent the forgetting. Moreover, we adversarially train CPE with ResAG and learnable text embeddings in an iterative manner to maximize erasing performance and enhance robustness against adversarial attacks. Extensive experiments on the erasure of celebrities, artistic styles, and explicit contents demonstrated that the proposed CPE outperforms prior arts by keeping diverse remaining concepts while deleting the target concepts with robustness against attack prompts. Code is available at https://github.com/Hyun1A/CPE
Problem

Research questions and friction points this paper is trying to address.

Erase target concepts in diffusion models without distorting others
Improve concept preservation via nonlinear Residual Attention Gates
Enhance robustness against adversarial attacks during concept erasure
Innovation

Methods, ideas, or system contributions that make the work stand out.

Nonlinear Residual Attention Gates for selective erasing
Attention anchoring loss safeguards remaining concepts
Adversarial training enhances robustness and erasing performance
B
Byung Hyun Lee
Department of ECE, Seoul National University
S
Sungjin Lim
IPAI, Seoul National University
S
Seunggyu Lee
Department of ECE, Seoul National University
Dong Un Kang
Dong Un Kang
PhD student, Seoul National University
Deep learning
Se Young Chun
Se Young Chun
Department of Electrical and Computer Engineering, Seoul National University
computational imagingmachine learningsignal processingmultimodal processing