🤖 AI Summary
This work addresses the challenge of efficiently and scalably performing selective forgetting of harmful or sensitive concepts in large-scale diffusion models. The authors propose a hypernetwork-based approach that dynamically generates LoRA weights conditioned on CLIP semantic embeddings, enabling context-aware adaptation for both Stable Diffusion and flow-based text-to-image models. By leveraging CLIP-guided hypernetworks to modulate LoRA parameters, the method achieves precise erasure of single or multiple target concepts—such as object removal, celebrity deletion, and sensitive content filtering—while preserving the model’s overall generative capabilities. This study presents the first integration of hypernetworks with CLIP guidance for dynamic LoRA modulation, significantly improving forgetting accuracy, semantic fidelity, and multi-concept scalability compared to existing approaches.
📝 Abstract
Recent advances in large-scale diffusion models have intensified concerns about their potential misuse, particularly in generating realistic yet harmful or socially disruptive content. This challenge has spurred growing interest in effective machine unlearning, the process of selectively removing specific knowledge or concepts from a model without compromising its overall generative capabilities. Among various approaches, Low-Rank Adaptation (LoRA) has emerged as an effective and efficient method for fine-tuning models toward targeted unlearning. However, LoRA-based methods often exhibit limited adaptability to concept semantics and struggle to balance removing closely related concepts with maintaining generalization across broader meanings. Moreover, these methods face scalability challenges when multiple concepts must be erased simultaneously. To address these limitations, we introduce UnHype, a framework that incorporates hypernetworks into single- and multi-concept LoRA training. The proposed architecture can be directly plugged into Stable Diffusion as well as modern flow-based text-to-image models, where it demonstrates stable training behavior and effective concept control. During inference, the hypernetwork dynamically generates adaptive LoRA weights based on the CLIP embedding, enabling more context-aware, scalable unlearning. We evaluate UnHype across several challenging tasks, including object erasure, celebrity erasure, and explicit content removal, demonstrating its effectiveness and versatility. Repository: https://github.com/gmum/UnHype.