LAURA: Enhancing Code Review Generation with Context-Enriched Retrieval-Augmented LLM

📅 2025-12-01

📈 Citations: 0

✨ Influential: 0

career value

157K/year

🤖 AI Summary

Code review has become a bottleneck in software development due to its time-consuming nature, high knowledge intensity, and scarcity of experienced reviewers. Existing automated approaches neglect both the contextual information of code changes and historical review knowledge, resulting in suboptimal review quality. To address this, we propose a large language model–based, context-aware review generation framework. Our method innovatively integrates retrieval-augmented generation (RAG), exemplar-based review retrieval, multi-granularity context augmentation, and a systematic prompt-guiding mechanism to construct high-quality training and inference datasets. We evaluate our approach using ChatGPT-4o and DeepSeek-v3. Results show that the generated review comments are fully correct in 42.2% of cases and practically helpful in 40.4%, significantly outperforming state-of-the-art baselines. This demonstrates substantial improvements in both accuracy and practical utility of automated code review.

Technology Category

Application Category

📝 Abstract

Code review is critical for ensuring software quality and maintainability. With the rapid growth in software scale and complexity, code review has become a bottleneck in the development process because of its time-consuming and knowledge-intensive nature and the shortage of experienced developers willing to review code. Several approaches have been proposed for automatically generating code reviews based on retrieval, neural machine translation, pre-trained models, or large language models (LLMs). These approaches mainly leverage historical code changes and review comments. However, a large amount of crucial information for code review, such as the context of code changes and prior review knowledge, has been overlooked. This paper proposes an LLM-based review knowledge-augmented, context-aware framework for code review generation, named LAURA. The framework integrates review exemplar retrieval, context augmentation, and systematic guidance to enhance the performance of ChatGPT-4o and DeepSeek v3 in generating code review comments. Besides, given the extensive low-quality reviews in existing datasets, we also constructed a high-quality dataset. Experimental results show that for both models, LAURA generates review comments that are either completely correct or at least helpful to developers in 42.2% and 40.4% of cases, respectively, significantly outperforming SOTA baselines. Furthermore, our ablation studies demonstrate that all components of LAURA contribute positively to improving comment quality.

Problem

Research questions and friction points this paper is trying to address.

Automates code review generation to address developer shortage and time constraints

Incorporates contextual code changes and historical review knowledge previously overlooked

Enhances LLM performance using retrieval-augmented context and systematic guidance

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based framework with context-aware retrieval

Integrates review exemplars and systematic guidance

Constructs high-quality dataset for improved performance

🔎 Similar Papers

CRScore: Grounding Automated Evaluation of Code Review Comments in Code Claims and Smells