🤖 AI Summary
Large language models (LLMs) face challenges in code generation—including low search efficiency, high token overhead, and lack of anytime capability: constructive tree search suffers from exponential state-space explosion, while existing improvements are hindered by sparse rewards and inefficient policies. This paper proposes ReLoc, the first systematic, unified local search framework tailored for code generation, comprising four stages: initial encoding, neighborhood generation, revision-based reward evaluation, and code update. Its core contributions are: (1) a fine-grained revision reward model that provides dense, semantics-aware feedback via modification distance; and (2) algorithm-agnostic design supporting flexible instantiation (e.g., hill climbing, genetic algorithms), ensuring both efficiency and anytime behavior. Experiments demonstrate that ReLoc significantly outperforms tree search and state-of-the-art enhancements across multiple tasks, improving code quality, convergence speed, and inference efficiency.
📝 Abstract
Large Language Models (LLMs) with inference-time scaling techniques show promise for code generation, yet face notable efficiency and scalability challenges. Construction-based tree-search methods suffer from rapid growth in tree size, high token consumption, and lack of anytime property. In contrast, improvement-based methods offer better performance but often struggle with uninformative reward signals and inefficient search strategies. In this work, we propose extbf{ReLoc}, a unified local search framework which effectively performs step-by-step code revision. Specifically, ReLoc explores a series of local revisions through four key algorithmic components: initial code drafting, neighborhood code generation, candidate evaluation, and incumbent code updating, each of which can be instantiated with specific decision rules to realize different local search algorithms such as Hill Climbing (HC) or Genetic Algorithm (GA). Furthermore, we develop a specialized revision reward model that evaluates code quality based on revision distance to produce fine-grained preferences that guide the local search toward more promising candidates. Finally, our extensive experimental results demonstrate that our approach achieves superior performance across diverse code generation tasks, significantly outperforming both construction-based tree search as well as the state-of-the-art improvement-based code generation methods.