SAKE: Steering Activations for Knowledge Editing

📅 2025-03-03

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

To address weak contextual robustness and insufficient logical reasoning generalization in large language model (LLM) knowledge editing, this paper proposes Activation-Guided Knowledge Editing (AGKE). AGKE models target facts as a probability distribution over paraphrases and logical entailments, and—novelly—employs optimal transport theory to align source and target fact distributions, thereby transcending the limitations of single-prompt editing paradigms. By performing distribution-level steering in the activation space, AGKE enables controllable and robust factual correction. On multiple knowledge editing benchmarks, AGKE achieves a 12.6% improvement in edit accuracy, reduces forgetting by 41%, and enhances logical consistency by 37% over state-of-the-art methods. These gains demonstrate significantly improved cross-context stability and reasoning generalization capability.

Technology Category

Application Category

📝 Abstract

As Large Langue Models have been shown to memorize real-world facts, the need to update this knowledge in a controlled and efficient manner arises. Designed with these constraints in mind, Knowledge Editing (KE) approaches propose to alter specific facts in pretrained models. However, they have been shown to suffer from several limitations, including their lack of contextual robustness and their failure to generalize to logical implications related to the fact. To overcome these issues, we propose SAKE, a steering activation method that models a fact to be edited as a distribution rather than a single prompt. Leveraging Optimal Transport, SAKE alters the LLM behavior over a whole fact-related distribution, defined as paraphrases and logical implications. Several numerical experiments demonstrate the effectiveness of this method: SAKE is thus able to perform more robust edits than its existing counterparts.

Problem

Research questions and friction points this paper is trying to address.

Updating real-world facts in Large Language Models efficiently.

Overcoming limitations in contextual robustness and generalization.

Editing facts as distributions using Optimal Transport in SAKE.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Models facts as distributions, not single prompts

Uses Optimal Transport for behavior alteration

Enhances robustness over paraphrases and implications

🔎 Similar Papers

No similar papers found.