Offset Unlearning for Large Language Models

📅 2024-04-17

🏛️ arXiv.org

📈 Citations: 14

✨ Influential: 4

career value

162K/year

🤖 AI Summary

To mitigate sensitive information leakage from black-box large language models (LLMs) caused by training data memorization—while complying with GDPR and other data protection regulations—this paper proposes δ-Unlearning, a novel offset-based unlearning framework. Unlike conventional approaches, δ-Unlearning requires neither access to model weights nor retention of original sensitive data. Its core innovation is the first-ever logits-difference modeling paradigm for offset learning, enabling gradient-free optimization of black-box API outputs via a lightweight proxy model. The framework is algorithm-agnostic, supporting integration with diverse unlearning methods, end-to-end knowledge distillation, and inference-time calibration. Evaluated across multiple benchmarks, δ-Unlearning achieves over 92% forgetting success rate while simultaneously improving average accuracy on general-purpose tasks by 0.8%, thereby reconciling strong privacy guarantees with high model utility.

Technology Category

Application Category

📝 Abstract

Despite the strong capabilities of Large Language Models (LLMs) to acquire knowledge from their training corpora, the memorization of sensitive information in the corpora such as copyrighted, biased, and private content has led to ethical and legal concerns. In response to these challenges, unlearning has emerged as a potential remedy for LLMs affected by problematic training data. However, previous unlearning techniques are either not applicable to black-box LLMs due to required access to model internal weights, or violate data protection principles by retaining sensitive data for inference-time correction. We propose {delta}-Unlearning, an offset unlearning framework for black-box LLMs. Instead of tuning the black-box LLM itself, {delta}-Unlearning learns the logit offset needed for unlearning by contrasting the logits from a pair of smaller models. Experiments demonstrate that {delta}- Unlearning can effectively unlearn target data while maintaining similar or even stronger performance on general out-of-forget-scope tasks. {delta}-Unlearning also effectively incorporates different unlearning algorithms, making our approach a versatile solution to adapting various existing unlearning algorithms to black-box LLMs.

Problem

Research questions and friction points this paper is trying to address.

Addresses memorization of sensitive data in LLMs

Proposes unlearning without accessing model weights

Enables unlearning for black-box LLMs effectively

Innovation

Methods, ideas, or system contributions that make the work stand out.

Offset unlearning for black-box LLMs

Learns logit offset via smaller models

Versatile adaptation of unlearning algorithms

🔎 Similar Papers

Mitigating Memorization In Language Models