🤖 AI Summary
This paper addresses the problem of controllable concept erasure in unified autoregressive multimodal models. To balance erasure efficacy and generation fidelity, we propose a fine-tuning method featuring: (1) windowed gradient accumulation to mitigate gradient conflicts induced by concept erasure; (2) threshold-based loss masking to selectively suppress high-confidence erroneous generations; and (3) structured prompt generation jointly optimized with a vision classifier to enhance semantic consistency during erasure. We introduce ECGVF—the first multi-dimensional benchmark for concept erasure evaluation—assessing Erasure, Consistency, Generation quality, and Vision-based Filtering. Experiments on Janus-Pro demonstrate that our method achieves a +28.6% improvement in target concept erasure rate while preserving text generation quality (BLEU drop <1.2) and image generation fidelity (CLIP Score drop <0.8), confirming a Pareto-optimal trade-off between utility preservation and concept removal.
📝 Abstract
Autoregressive (AR) models have achieved unified and strong performance across both visual understanding and image generation tasks. However, removing undesired concepts from AR models while maintaining overall generation quality remains an open challenge. In this paper, we propose Erasure Autoregressive Model (EAR), a fine-tuning method for effective and utility-preserving concept erasure in AR models. Specifically, we introduce Windowed Gradient Accumulation (WGA) strategy to align patch-level decoding with erasure objectives, and Thresholded Loss Masking (TLM) strategy to protect content unrelated to the target concept during fine-tuning. Furthermore, we propose a novel benchmark, Erase Concept Generator and Visual Filter (ECGVF), aim at provide a more rigorous and comprehensive foundation for evaluating concept erasure in AR models. Specifically, we first employ structured templates across diverse large language models (LLMs) to pre-generate a large-scale corpus of target-replacement concept prompt pairs. Subsequently, we generate images from these prompts and subject them to rigorous filtering via a visual classifier to ensure concept fidelity and alignment. Extensive experimental results conducted on the ECGVF benchmark with the AR model Janus-Pro demonstrate that EAR achieves marked improvements in both erasure effectiveness and model utility preservation. Code is available at: https://github.com/immc-lab/ear/