Do LLMs Really Forget? Evaluating Unlearning with Knowledge Correlation and Confidence Awareness

📅 2025-06-06

📈 Citations: 0

✨ Influential: 0

career value

153K/year

🤖 AI Summary

Existing LLM unlearning evaluation methods focus narrowly on isolated fact removal, neglecting logical dependencies among knowledge and the non-deterministic nature of internal model representations—leading to implicit knowledge retention. Method: We propose the first unlearning evaluation framework integrating knowledge graphs and confidence modeling. It introduces a knowledge-relevance-aware and confidence-aware evaluation paradigm, and designs an LLM-based, calibratable, human-aligned automated adjudication protocol—including prompt-engineered subgraph discrimination, human evaluation calibration, and a self-constructed unlearning benchmark. Contribution/Results: Experiments reveal that mainstream unlearning methods overestimate success rates by 23.7% on average. Our LLM adjudicator achieves 92.4% agreement with human evaluators. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

Machine unlearning techniques aim to mitigate unintended memorization in large language models (LLMs). However, existing approaches predominantly focus on the explicit removal of isolated facts, often overlooking latent inferential dependencies and the non-deterministic nature of knowledge within LLMs. Consequently, facts presumed forgotten may persist implicitly through correlated information. To address these challenges, we propose a knowledge unlearning evaluation framework that more accurately captures the implicit structure of real-world knowledge by representing relevant factual contexts as knowledge graphs with associated confidence scores. We further develop an inference-based evaluation protocol leveraging powerful LLMs as judges; these judges reason over the extracted knowledge subgraph to determine unlearning success. Our LLM judges utilize carefully designed prompts and are calibrated against human evaluations to ensure their trustworthiness and stability. Extensive experiments on our newly constructed benchmark demonstrate that our framework provides a more realistic and rigorous assessment of unlearning performance. Moreover, our findings reveal that current evaluation strategies tend to overestimate unlearning effectiveness. Our code is publicly available at https://github.com/Graph-COM/Knowledge_Unlearning.git.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLM unlearning of implicit knowledge dependencies

Assessing unlearning effectiveness with knowledge graphs

Addressing overestimation in current unlearning evaluations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Knowledge graphs represent factual contexts

LLM judges evaluate unlearning success

Calibrated prompts ensure trustworthy assessments

🔎 Similar Papers

Unlearning or Obfuscating? Jogging the Memory of Unlearned LLMs via Benign Relearning