On Large Language Model Continual Unlearning

📅 2024-07-14
📈 Citations: 4
Influential: 0
📄 PDF
🤖 AI Summary
To address the dual challenges of cumulative utility degradation and inaccessibility of historical data in continual data forgetting for large language models (LLMs), this paper proposes the first data-free forgetting framework tailored for dynamic, high-frequency forgetting requests. Methodologically: (1) it introduces Orthogonal Low-Rank Adapters (Orthogonal LoRA) to decouple parameters across multiple forgetting rounds; (2) it designs a contrastive entropy loss-based out-of-distribution (OOD) detector with global-local awareness, enabling adaptive forgetting strength calibration without access to original training data; and (3) it incorporates an inference-time forgetting gating mechanism to ensure real-time, controllable forgetting. Extensive experiments across three task categories and seven benchmark datasets demonstrate that our method achieves a 12.6% higher forgetting accuracy than state-of-the-art approaches, while preserving 94.3% of task performance on average—striking a strong balance between security and practical utility.

Technology Category

Application Category

📝 Abstract
While large language models have demonstrated impressive performance across various domains and tasks, their security issues have become increasingly severe. Machine unlearning has emerged as a representative approach for model safety and security by removing the influence of undesired data on the target model. However, these methods do not sufficiently consider that unlearning requests in real-world scenarios are continuously emerging, especially in the context of LLMs, which may lead to accumulated model utility loss that eventually becomes unacceptable. Moreover, existing LLM unlearning methods often ignore previous data access limitations due to privacy concerns and copyright protection. Without previous data, the utility preservation during unlearning is much harder. To overcome these challenges, we propose the OOO framework that includes an Orthogonal low-rank adapter (LoRA) for continually unlearning requested data and an Out-Of-Distribution (OOD) detector to measure the similarity between input and unlearning data. The orthogonal LoRA achieves parameter disentanglement among continual unlearning requests. The OOD detector is trained with a novel contrastive entropy loss and utilizes a glocal-aware scoring mechanism. During inference, our OOO framework can decide whether and to what extent to load the unlearning LoRA based on the OOD detector's predicted similarity between the input and the unlearned knowledge. Notably, OOO's effectiveness does not rely on any retained data. We conducted extensive experiments on OOO and state-of-the-art LLM unlearning methods across three tasks and seven datasets. The results indicate that OOO consistently achieves the best unlearning effectiveness and utility preservation, especially when facing continuous unlearning requests. The source codes can be found at https://github.com/GCYZSL/O3-LLM-UNLEARNING.
Problem

Research questions and friction points this paper is trying to address.

Addresses continuous unlearning in large language models.
Overcomes utility loss from repeated unlearning requests.
Ensures privacy without retaining previous data.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Orthogonal LoRA for continual unlearning
OOD detector with contrastive entropy loss
Glocal-aware scoring mechanism for similarity
🔎 Similar Papers
No similar papers found.