EAR: Erasing Concepts from Unified Autoregressive Models

📅 2025-06-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the problem of controllable concept erasure in unified autoregressive multimodal models. To balance erasure efficacy and generation fidelity, we propose a fine-tuning method featuring: (1) windowed gradient accumulation to mitigate gradient conflicts induced by concept erasure; (2) threshold-based loss masking to selectively suppress high-confidence erroneous generations; and (3) structured prompt generation jointly optimized with a vision classifier to enhance semantic consistency during erasure. We introduce ECGVF—the first multi-dimensional benchmark for concept erasure evaluation—assessing Erasure, Consistency, Generation quality, and Vision-based Filtering. Experiments on Janus-Pro demonstrate that our method achieves a +28.6% improvement in target concept erasure rate while preserving text generation quality (BLEU drop <1.2) and image generation fidelity (CLIP Score drop <0.8), confirming a Pareto-optimal trade-off between utility preservation and concept removal.

Technology Category

Application Category

📝 Abstract
Autoregressive (AR) models have achieved unified and strong performance across both visual understanding and image generation tasks. However, removing undesired concepts from AR models while maintaining overall generation quality remains an open challenge. In this paper, we propose Erasure Autoregressive Model (EAR), a fine-tuning method for effective and utility-preserving concept erasure in AR models. Specifically, we introduce Windowed Gradient Accumulation (WGA) strategy to align patch-level decoding with erasure objectives, and Thresholded Loss Masking (TLM) strategy to protect content unrelated to the target concept during fine-tuning. Furthermore, we propose a novel benchmark, Erase Concept Generator and Visual Filter (ECGVF), aim at provide a more rigorous and comprehensive foundation for evaluating concept erasure in AR models. Specifically, we first employ structured templates across diverse large language models (LLMs) to pre-generate a large-scale corpus of target-replacement concept prompt pairs. Subsequently, we generate images from these prompts and subject them to rigorous filtering via a visual classifier to ensure concept fidelity and alignment. Extensive experimental results conducted on the ECGVF benchmark with the AR model Janus-Pro demonstrate that EAR achieves marked improvements in both erasure effectiveness and model utility preservation. Code is available at: https://github.com/immc-lab/ear/
Problem

Research questions and friction points this paper is trying to address.

Remove undesired concepts from AR models while maintaining quality
Align patch-level decoding with erasure objectives effectively
Evaluate concept erasure rigorously with a novel benchmark
Innovation

Methods, ideas, or system contributions that make the work stand out.

Windowed Gradient Accumulation for patch-level erasure
Thresholded Loss Masking to protect unrelated content
ECGVF benchmark for rigorous concept erasure evaluation
🔎 Similar Papers
No similar papers found.
H
Haipeng Fan
Inner Mongolia University
S
Shiyuan Zhang
Inner Mongolia University
B
Baohunesitu
Inner Mongolia University
Z
Zihang Guo
Inner Mongolia University
Huaiwen Zhang
Huaiwen Zhang
Northeastern University