🤖 AI Summary
To address the slow decoding speed, neglect of locality, and underutilization of code reuse in large language models (LLMs) for code editing tasks, this paper proposes the first editing-oriented speculative decoding framework. Our method integrates an edit-aware lightweight draft model, a dynamic verifier, and a locality-aware edit localization strategy—explicitly modeling the localized nature of code changes and intelligently reusing original code segments to generate high-quality edit drafts. Unlike conventional speculative decoding, our framework is co-designed at both architectural and mechanistic levels specifically for code editing. Experiments on CanItEdit and CodeIF-Bench demonstrate 10.38× and 13.09× decoding speedups, respectively, achieving up to 90.6% improvement over the state-of-the-art acceleration methods while preserving edit accuracy.
📝 Abstract
Large Language Models (LLMs) have demonstrated remarkable capabilities in code editing, substantially enhancing software development productivity. However, the inherent complexity of code editing tasks forces existing approaches to rely on LLMs' autoregressive end-to-end generation, where decoding speed plays a critical role in efficiency. While inference acceleration techniques like speculative decoding are applied to improve the decoding efficiency, these methods fail to account for the unique characteristics of code editing tasks where changes are typically localized and existing code segments are reused. To address this limitation, we propose EfficientEdit, a novel method that improves LLM-based code editing efficiency through two key mechanisms based on speculative decoding: (1) effective reuse of original code segments while identifying potential edit locations, and (2) efficient generate edit content via high-quality drafts from edit-oriented draft models and a dynamic verification mechanism that balances quality and acceleration. Experimental results show that EfficientEdit can achieve up to 10.38$ imes$ and 13.09$ imes$ speedup compared to standard autoregressive decoding in CanItEdit and CodeIF-Bench, respectively, outperforming state-of-the-art inference acceleration approaches by up to 90.6%.