🤖 AI Summary
Existing knowledge editing (KE) methods suffer significant performance degradation in multi-hop fact updates, especially when edits involve implicit intermediate subjects within reasoning chains. This work uncovers a mechanism in large language models wherein implicit subjects drive multi-hop reasoning through dynamically activated cross-layer query-value (Q-V) neuron pathways. Building on this insight, we propose ACE, an attribution-controlled KE framework that leverages causal analysis and neuron-level attribution to precisely identify and edit critical Q-V pathways, enabling fine-grained intervention over implicit knowledge associations. Experiments demonstrate that ACE outperforms state-of-the-art methods by 9.44% on GPT-J and by 37.46% on Qwen3-8B. To our knowledge, ACE is the first KE approach grounded in dynamic neuronal mechanisms, establishing a novel, interpretable, and controllable paradigm for multi-hop knowledge editing.
📝 Abstract
Large Language Models (LLMs) require efficient knowledge editing (KE) to update factual information, yet existing methods exhibit significant performance decay in multi-hop factual recall. This failure is particularly acute when edits involve intermediate implicit subjects within reasoning chains. Through causal analysis, we reveal that this limitation stems from an oversight of how chained knowledge is dynamically represented and utilized at the neuron level. We discover that during multi hop reasoning, implicit subjects function as query neurons, which sequentially activate corresponding value neurons across transformer layers to accumulate information toward the final answer, a dynamic prior KE work has overlooked. Guided by this insight, we propose ACE: Attribution-Controlled Knowledge Editing for Multi-hop Factual Recall, a framework that leverages neuron-level attribution to identify and edit these critical query-value (Q-V) pathways. ACE provides a mechanistically grounded solution for multi-hop KE, empirically outperforming state-of-the-art methods by 9.44% on GPT-J and 37.46% on Qwen3-8B. Our analysis further reveals more fine-grained activation patterns in Qwen3 and demonstrates that the semantic interpretability of value neurons is orchestrated by query-driven accumulation. These findings establish a new pathway for advancing KE capabilities based on the principled understanding of internal reasoning mechanisms.