SAKE: Steering Activations for Knowledge Editing

📅 2025-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address weak contextual robustness and insufficient logical reasoning generalization in large language model (LLM) knowledge editing, this paper proposes Activation-Guided Knowledge Editing (AGKE). AGKE models target facts as a probability distribution over paraphrases and logical entailments, and—novelly—employs optimal transport theory to align source and target fact distributions, thereby transcending the limitations of single-prompt editing paradigms. By performing distribution-level steering in the activation space, AGKE enables controllable and robust factual correction. On multiple knowledge editing benchmarks, AGKE achieves a 12.6% improvement in edit accuracy, reduces forgetting by 41%, and enhances logical consistency by 37% over state-of-the-art methods. These gains demonstrate significantly improved cross-context stability and reasoning generalization capability.

Technology Category

Application Category

📝 Abstract
As Large Langue Models have been shown to memorize real-world facts, the need to update this knowledge in a controlled and efficient manner arises. Designed with these constraints in mind, Knowledge Editing (KE) approaches propose to alter specific facts in pretrained models. However, they have been shown to suffer from several limitations, including their lack of contextual robustness and their failure to generalize to logical implications related to the fact. To overcome these issues, we propose SAKE, a steering activation method that models a fact to be edited as a distribution rather than a single prompt. Leveraging Optimal Transport, SAKE alters the LLM behavior over a whole fact-related distribution, defined as paraphrases and logical implications. Several numerical experiments demonstrate the effectiveness of this method: SAKE is thus able to perform more robust edits than its existing counterparts.
Problem

Research questions and friction points this paper is trying to address.

Updating real-world facts in Large Language Models efficiently.
Overcoming limitations in contextual robustness and generalization.
Editing facts as distributions using Optimal Transport in SAKE.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Models facts as distributions, not single prompts
Uses Optimal Transport for behavior alteration
Enhances robustness over paraphrases and implications
🔎 Similar Papers
No similar papers found.
Marco Scialanga
Marco Scialanga
Agimus Technologies
Thibault Laugel
Thibault Laugel
Researcher @AXA, Associate Researcher @Sorbonne Université/LIP6
Machine LearningXAIAI FairnessTrustworthy ML
V
Vincent Grari
AXA, Paris, France; TRAIL, LIP6, Sorbonne Université, Paris, France
M
Marcin Detyniecki
AXA, Paris, France; TRAIL, LIP6, Sorbonne Université, Paris, France; Polish Academy of Science, IBS PAN, Warsaw, Poland