🤖 AI Summary
This work addresses the challenges of knowledge interference and catastrophic forgetting in large language models during continual editing, which often arise from updating static layers. To mitigate these issues, the authors propose a hierarchical reinforcement learning framework that introduces, for the first time, an instance-aware dynamic layer selection mechanism. This approach adaptively activates the most relevant model layers for each edit, enabling precise and localized knowledge updates. By integrating parameter perturbation with intrinsic rewards based on sparsity, the method significantly enhances both editing efficiency and stability. Experimental results across multiple mainstream large language models demonstrate an average improvement of 8.48% in RLEdit performance, while requiring perturbations in only approximately half of the network layers per edit.
📝 Abstract
Lifelong model editing (LME) aims to sequentially rectify outdated or inaccurate knowledge in deployed LLMs while minimizing side effects on unrelated inputs. However, existing approaches typically apply parameter perturbations to a static and dense set of LLM layers for all editing instances. This practice is counter-intuitive, as we hypothesize that different pieces of knowledge are stored in distinct layers of the model. Neglecting this layer-wise specificity can impede adaptability in integrating new knowledge and result in catastrophic forgetting for both general and previously edited knowledge. To address this, we propose HiEdit, a hierarchical reinforcement learning framework that adaptively identifies the most knowledge-relevant layers for each editing instance. By enabling dynamic, instance-aware layer selection and incorporating an intrinsic reward for sparsity, HiEdit achieves precise, localized updates. Experiments on various LLMs show that HiEdit boosts the performance of the competitive RLEdit by an average of 8.48% with perturbing only half of the layers per edit. Our code is available at: https://github.com/yangfanww/hiedit.