CoreUnlearn: Rethinking Concept Unlearning through Disentangled Component-Level Erasure in Text-guided Diffusion Models

📅 2026-06-01

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

Existing text-guided diffusion models for concept erasure are constrained by the representational capacity of textual space and rely on predefined erasure references, often compromising overall generation performance. This work proposes a component-level disentangled forgetting mechanism that decomposes concept embeddings into critical and non-critical components via a Component Extraction Module (CEM) and a Swap Disentanglement Strategy (SDS). By selectively removing only the harmful portions and fine-tuning weights accordingly—without requiring any predefined reference—the method achieves precise and robust concept erasure. It effectively forgets target content while substantially preserving the model’s general generative capabilities, outperforming current alignment-based fine-tuning approaches.

📝 Abstract

Text guided diffusion models have revolutionized image synthesis but also raise ethical concerns, such as privacy violation and harmful content generation. To mitigate these issues, prevailing methods typically leverage an alignment mechanism, with predefined erasure references, to fine-tune pretrained model weights. However, these techniques are intrinsically limited by the representational capacity of textual space and display high sensitivity to the choice of predefined erasure references, e.g., suboptimal references may significantly affect the model utility preservation during erasure. To overcome these limitations, we introduce CoreUnlearn, aiming to disentangle and remove the erasure-critical component of the undesirable concept. Specifically, CoreUnlearn comprises a Component Extraction Module (CEM) and a Swap Disentangling Strategy (SDS). Guided by SDS, CEM is pre-trained to decompose concept embeddings into distinct component types. Leveraging this decomposition, CoreUnlearn then removes the erasure-critical component while retaining non-critical ones by fine-tuning model weights. Extensive experiments demonstrate that CoreUnlearn achieves effective concept erasure with minimal impact on overall model performance.

Problem

Research questions and friction points this paper is trying to address.

concept unlearning

diffusion models

privacy violation

harmful content generation

text-guided image synthesis

Innovation

Methods, ideas, or system contributions that make the work stand out.

concept unlearning

disentangled representation

component-level erasure