Improving LLM-Based Go Code Review through Issue-List Generation and Context Augmentation

📅 2026-06-01

📈 Citations: 0

✨ Influential: 0

career value

153K/year

🤖 AI Summary

Existing large language model–based code review approaches typically identify only isolated issues and underutilize contextual information, limiting both coverage and practical utility. This work proposes a checklist-style review paradigm that integrates three context-enhancement strategies—neighboring code, LSP-based semantics, and IR-derived similarity and co-change signals—and introduces a candidate ensemble with a refinement-guided pruning mechanism to substantially reduce redundant comments while preserving high precision. Evaluated on 1,438 Go code review instances, the method achieves a refined exact-match accuracy of 28.00%, surpassing the baseline by 10.85 percentage points and significantly outperforming CodeReviewer (15.02%), approaching the human upper bound of 36.09%. Moreover, the average number of candidate comments per instance is reduced from 7.2 to 3.1.

📝 Abstract

LLMs have shown strong potential for automating code review, yet their practical utility depends heavily on the design of generation and context strategies. In this paper, we investigate how to improve LLM-based code review through generation strategy and contextual augmentation. We first propose an issue-list review paradigm, in which LLMs enumerate all potential issues rather than reporting only the single most important one (i.e., primary-issue review). We then systematically compare three types of code context augmentation -- neighboring, LSP-based semantics, and IR-based similar co-change context -- and study how they influence issue discovery. Finally, we integrate candidates from no-context and context-enhanced generation to improve review coverage, and introduce refinement-guided pruning to keep the candidate list at a practical size. We evaluate our approach on 1,438 Go review instances using downstream code refinement as the main metric, i.e., how often the candidate list contains at least one comment inducing the same code change as the final human revision. For comparison, we evaluate comments by CodeReviewer, a model trained specifically for review comment generation, as well as ground-truth human review comments (as a practical upper bound), under the same refinement-based evaluation. The results show that our best configuration, combining issue-list review, neighboring and similar co-change context, and candidate integration, reaches 28.00% refinement exact match, a statistically significant gain of +10.85 percentage points over primary-issue review without any additional context (17.15%), substantially outperforming CodeReviewer (15.02%) and approaching the human-oracle ceiling of 36.09%. Our refinement-guided pruning reduces the average candidate count from 7.2 to 3.1 at top-5 while retaining nearly the full benefit, making the candidate list easier to inspect.

Problem

Research questions and friction points this paper is trying to address.

LLM-based code review

issue-list generation

context augmentation

Go code review

code refinement

Innovation

Methods, ideas, or system contributions that make the work stand out.

issue-list review

context augmentation

code review