🤖 AI Summary
To address weak contextual robustness and insufficient logical reasoning generalization in large language model (LLM) knowledge editing, this paper proposes Activation-Guided Knowledge Editing (AGKE). AGKE models target facts as a probability distribution over paraphrases and logical entailments, and—novelly—employs optimal transport theory to align source and target fact distributions, thereby transcending the limitations of single-prompt editing paradigms. By performing distribution-level steering in the activation space, AGKE enables controllable and robust factual correction. On multiple knowledge editing benchmarks, AGKE achieves a 12.6% improvement in edit accuracy, reduces forgetting by 41%, and enhances logical consistency by 37% over state-of-the-art methods. These gains demonstrate significantly improved cross-context stability and reasoning generalization capability.
📝 Abstract
As Large Langue Models have been shown to memorize real-world facts, the need to update this knowledge in a controlled and efficient manner arises. Designed with these constraints in mind, Knowledge Editing (KE) approaches propose to alter specific facts in pretrained models. However, they have been shown to suffer from several limitations, including their lack of contextual robustness and their failure to generalize to logical implications related to the fact. To overcome these issues, we propose SAKE, a steering activation method that models a fact to be edited as a distribution rather than a single prompt. Leveraging Optimal Transport, SAKE alters the LLM behavior over a whole fact-related distribution, defined as paraphrases and logical implications. Several numerical experiments demonstrate the effectiveness of this method: SAKE is thus able to perform more robust edits than its existing counterparts.