🤖 AI Summary
This work addresses the challenges of inadequate planning and inefficient function reuse in repository-level code generation with large language models, which stem from complex dependencies and limited context windows. To overcome these issues, the authors propose TICoder, a novel framework that integrates a test-driven iterative planning mechanism with an implementation-aware dual-perspective (functional and implementation) function retrieval strategy. TICoder further enhances behavioral consistency and reuse effectiveness through a two-stage selection pipeline combining retrieval-augmented generation, structural clustering, and perplexity-based filtering. Experimental results demonstrate that TICoder outperforms state-of-the-art methods by an average of 11.52% across multiple established repository-level code generation benchmarks, significantly improving the quality of generated code.
📝 Abstract
Repository-level code generation with Large Language Models (LLMs) remains challenging, primarily due to complex dependencies and limited context windows. Recent approaches adopt retrieval-augmented generation (RAG) and the planning mechanism to reuse potential callee functions in the repository. However, these approaches often suffer from two limitations: lack of test-driven behavioral guidance during planning and overlooking the implementation logic embedded in repository code during reuse. As a result, generated plans may not align with expected behaviors, and retrieved functions may not be effectively reused. In this paper, we propose TICoder, a novel repository-level code generation framework that improves both planning and reuse. TICoder introduces a test-driven iterative planning mechanism that leverages test cases as behavioral specifications to refine implementation steps. Furthermore, TICoder employs an implementation-aware code reuse strategy, which retrieves potential callee functions using a dual-view similarity that captures both functional and implementation aspects. We then identify relevant usage patterns through a dual-stage selection strategy, combining structure-based clustering and perplexity-based filtering. We conduct extensive experiments on widely used repository-level code generation benchmarks with various LLMs. Experimental results demonstrate that TICoder outperforms state-of-the-art (SOTA) methods, achieving an average improvement of 11.52%.