Retrieval-augmented code completion for local projects using large language models

📅 2024-08-09
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses privacy-preserving, offline-deployable code completion for local development environments. Method: We propose a lightweight (160M-parameter) LLM system featuring (i) a Jaccard-similarity-based in-context retrieval-augmented generation (RAG) mechanism—replacing the computationally heavy RETRO architecture—to enable dynamic, low-overhead retrieval of local code snippets; and (ii) fine-grained, code-aware tokenization combined with GPT-2 backbone adaptation on domain-specific Python corpora. Contribution/Results: To our knowledge, this is the first empirical validation of Jaccard-driven lightweight RAG for small-scale LMs, outperforming RETRO significantly. Experiments demonstrate that our model achieves completion accuracy competitive with large models while maintaining low inference latency and memory footprint (<4 GB), establishing a new paradigm for private, resource-efficient code intelligence in constrained settings.

Technology Category

Application Category

📝 Abstract
The use of large language models (LLMs) is becoming increasingly widespread among software developers. However, privacy and computational requirements are problematic with commercial solutions and the use of LLMs. In this work, we focus on using LLMs with around 160 million parameters that are suitable for local execution and augmentation with retrieval from local projects. We train two models based on the transformer architecture, the generative model GPT-2 and the retrieval-adapted RETRO model, on open-source Python files, and empirically evaluate and compare them, confirming the benefits of vector embedding based retrieval. Further, we improve our models' performance with In-context retrieval-augmented generation, which retrieves code snippets based on the Jaccard similarity of tokens. We evaluate In-context retrieval-augmented generation on larger models and conclude that, despite its simplicity, the approach is more suitable than using the RETRO architecture. We highlight the key role of proper tokenization in achieving the full potential of LLMs in code completion.
Problem

Research questions and friction points this paper is trying to address.

Enhancing local code completion using small, efficient LLMs
Improving performance with retrieval-augmented generation techniques
Addressing privacy and computational constraints in LLM deployment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses small 160M-parameter LLMs for local execution
Employs In-context RAG with Jaccard similarity
Improves code completion by 26% over baseline
M
Marko Hostnik
aFaculty of Computer and Information Science, University of Ljubljana, Slovenia; bFaculty of Mathematics and Physics, University of Ljubljana, Slovenia
M
Marko Robnik-Sikonja
aFaculty of Computer and Information Science, University of Ljubljana, Slovenia