GUARD: Guided Unlearning and Retention via Data Attribution for Large Language Models

📅 2025-06-12

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

LLMs face a critical challenge in machine unlearning: targeted data deletion often triggers “unintended forgetting,” degrading model performance on retained knowledge. Addressing regulatory compliance requirements for data removal, this paper proposes a data-attribution-guided unlearning framework. Its core contributions are: (1) a lightweight proxy data attribution metric that enables the first alignment-aware quantification of forgetting and retention sets; and (2) an adaptive non-uniform unlearning weight mechanism, theoretically guaranteed to converge while jointly optimizing forgetting efficacy and retention fidelity. Evaluated on the TOFU benchmark, our method achieves a 194.92% improvement in ground-truth accuracy on the retention set when unlearning 10% of training data—substantially outperforming state-of-the-art approaches—while maintaining computational efficiency.

Technology Category

Application Category

📝 Abstract

Unlearning in large language models (LLMs) is becoming increasingly important due to regulatory compliance, copyright protection, and privacy concerns. However, a key challenge in LLM unlearning is unintended forgetting, where the removal of specific data inadvertently impairs the utility of the model and its retention of valuable, desired information. While prior work has primarily focused on architectural innovations, the influence of data-level factors on unlearning performance remains underexplored. As a result, existing methods often suffer from degraded retention when forgetting high-impact data. To address this, we propose GUARD-a novel framework for Guided Unlearning And Retention via Data attribution. At its core, GUARD introduces a lightweight proxy data attribution metric tailored for LLM unlearning, which quantifies the"alignment"between the forget and retain sets while remaining computationally efficient. Building on this, we design a novel unlearning objective that assigns adaptive, nonuniform unlearning weights to samples, inversely proportional to their proxy attribution scores. Through such a reallocation of unlearning power, GUARD mitigates unintended losses in retention. We provide rigorous theoretical guarantees that GUARD significantly enhances retention while maintaining forgetting metrics comparable to prior methods. Extensive experiments on the TOFU benchmark across multiple LLM architectures demonstrate that GUARD substantially improves utility preservation while ensuring effective unlearning. Notably, GUARD reduces utility sacrifice on the Retain Set by up to 194.92% in terms of Truth Ratio when forgetting 10% of the training data.

Problem

Research questions and friction points this paper is trying to address.

Addresses unintended forgetting in LLM unlearning processes

Explores data-level factors affecting unlearning performance

Improves retention of valuable information during unlearning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight proxy data attribution metric

Adaptive nonuniform unlearning weights

Enhanced retention with theoretical guarantees

🔎 Similar Papers

Unveiling Entity-Level Unlearning for Large Language Models: A Comprehensive Analysis