Not All Data Are Unlearned Equally

📅 2025-04-07

📈 Citations: 0

✨ Influential: 0

career value

153K/year

🤖 AI Summary

This work addresses a critical assumption bias in large language model (LLM) knowledge forgetting: the commonly held premise of uniform data-point importance is invalid. Empirical analysis reveals that knowledge frequency in pretraining data significantly modulates forgetting difficulty—high-frequency facts (e.g., commonsense) exhibit >40% lower forgetting success rates, whereas low-frequency sensitive information (e.g., phone numbers) is substantially easier to remove. Through controlled experiments across Llama and Mistral models, the study systematically evaluates fine-tuning, ROME, and MEMIT under a novel multi-dimensional evaluation framework comprising logit probing, generation sampling, and counterfactual QA. It is the first to identify a systematic misalignment between probabilistic and generative evaluation metrics—a discrepancy that intensifies with model scale. The work advocates for a new evaluation paradigm tightly coupled with data provenance and establishes frequency-aware principles for knowledge forgetting algorithm design.

Technology Category

Application Category

📝 Abstract

Machine unlearning is concerned with the task of removing knowledge learned from particular data points from a trained model. In the context of large language models (LLMs), unlearning has recently received increased attention, particularly for removing knowledge about named entities from models for privacy purposes. While various approaches have been proposed to address the unlearning problem, most existing approaches treat all data points to be unlearned equally, i.e., unlearning that Montreal is a city in Canada is treated exactly the same as unlearning the phone number of the first author of this paper. In this work, we show that this all data is equal assumption does not hold for LLM unlearning. We study how the success of unlearning depends on the frequency of the knowledge we want to unlearn in the pre-training data of a model and find that frequency strongly affects unlearning, i.e., more frequent knowledge is harder to unlearn. Additionally, we uncover a misalignment between probability and generation-based evaluations of unlearning and show that this problem worsens as models become larger. Overall, our experiments highlight the need for better evaluation practices and novel methods for LLM unlearning that take the training data of models into account.

Problem

Research questions and friction points this paper is trying to address.

Machine unlearning treats all data points equally, which is ineffective.

Unlearning success depends on knowledge frequency in pre-training data.

Current unlearning evaluations misalign with model generation performance.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unlearning varies by data frequency in LLMs

Frequency affects unlearning success significantly

Better evaluation needed for unlearning methods

🔎 Similar Papers

Unveiling Entity-Level Unlearning for Large Language Models: A Comprehensive Analysis