🤖 AI Summary
Existing concept erasure methods for text-to-image diffusion models (e.g., Stable Diffusion) struggle to simultaneously achieve effective removal of harmful concepts and preserve generative capability, posing significant safety risks from model misuse.
Method: We propose a key-step concept erasure method that leverages the iterative denoising nature of diffusion processes. By performing concept sensitivity analysis, we identify the most influential denoising timesteps for generating the target concept and apply targeted fine-tuning exclusively at those steps. Our approach designs task-specific key-step selection strategies, drastically reducing parameter updates while enhancing both erasure precision and generation fidelity.
Results: Experiments across multiple benchmarks demonstrate that our method effectively blocks harmful content generation while outperforming state-of-the-art erasure techniques in text-image alignment, image quality, and other key metrics—thereby better preserving the model’s original generative performance.
📝 Abstract
Text-to-image diffusion models (T2I DMs), represented by Stable Diffusion, which generate highly realistic images based on textual input, have been widely used. However, their misuse poses serious security risks. While existing concept unlearning methods aim to mitigate these risks, they struggle to balance unlearning effectiveness with generative retainability.To overcome this limitation, we innovatively propose the Key Step Concept Unlearning (KSCU) method, which ingeniously capitalizes on the unique stepwise sampling characteristic inherent in diffusion models during the image generation process. Unlike conventional approaches that treat all denoising steps equally, KSCU strategically focuses on pivotal steps with the most influence over the final outcome by dividing key steps for different concept unlearning tasks and fine-tuning the model only at those steps. This targeted approach reduces the number of parameter updates needed for effective unlearning, while maximizing the retention of the model's generative capabilities.Through extensive benchmark experiments, we demonstrate that KSCU effectively prevents T2I DMs from generating undesirable images while better retaining the model's generative capabilities.Our code will be released.