🤖 AI Summary
Current LLM editing methods are confined to short-text scenarios and fail to address real-world document-level editing requirements—namely, long input/output sequences, joint updates of multiple interdependent facts, and cross-paragraph generalization. This work formally defines the document-level model editing task for the first time and introduces DocMEdit, the first benchmark dataset tailored to this task, supporting document-scale I/O, multi-fact collaborative editing, and extrapolation evaluation. We propose a comprehensive evaluation framework integrating factual consistency, document coherence, and cross-paragraph generalization. Empirical results demonstrate substantial performance degradation of state-of-the-art editing methods on document-level tasks, exposing fundamental limitations in long-context modeling and multi-fact coordination. This work establishes a critical benchmark and evaluation paradigm to advance LLM editing from short-text settings toward practical document-level applications.
📝 Abstract
Model editing aims to correct errors and outdated knowledge in the Large language models (LLMs) with minimal cost. Prior research has proposed a variety of datasets to assess the effectiveness of these model editing methods. However, most existing datasets only require models to output short phrases or sentences, overlooks the widespread existence of document-level tasks in the real world, raising doubts about their practical usability. Aimed at addressing this limitation and promoting the application of model editing in real-world scenarios, we propose the task of document-level model editing. To tackle such challenges and enhance model capabilities in practical settings, we introduce enchmarkname, a dataset focused on document-level model editing, characterized by document-level inputs and outputs, extrapolative, and multiple facts within a single edit. We propose a series of evaluation metrics and experiments. The results show that the difficulties in document-level model editing pose challenges for existing model editing methods.