Rethinking Code Review Workflows with LLM Assistance: An Empirical Study

📅 2025-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Code review faces challenges including frequent context switching, information scarcity, and high false-positive rates and low trustworthiness of large language models (LLMs). Through industrial field studies, this work identifies core pain points and proposes two retrieval-augmented generation (RAG)-based semantic search prototypes: AI-driven pre-review and on-demand interactive review. It presents the first empirical comparison of both paradigms in real-world industrial pull request (PR) settings. We design a context-aware RAG pipeline integrating structured PR modeling, semantic retrieval, and trustworthy prompt engineering. Results show that AI-driven pre-review is preferred by developers, significantly reducing context-switching overhead, lowering false positives, and improving efficiency—effects moderated by reviewer code familiarity and PR severity. The key contribution lies in empirically validating the critical role of interaction paradigms in enhancing LLM-based review trustworthiness and practicality, alongside providing an industrial-ready RAG optimization framework.

Technology Category

Application Category

📝 Abstract
Code reviews are a critical yet time-consuming aspect of modern software development, increasingly challenged by growing system complexity and the demand for faster delivery. This paper presents a study conducted at WirelessCar Sweden AB, combining an exploratory field study of current code review practices with a field experiment involving two variations of an LLM-assisted code review tool. The field study identifies key challenges in traditional code reviews, including frequent context switching, insufficient contextual information, and highlights both opportunities (e.g., automatic summarization of complex pull requests) and concerns (e.g., false positives and trust issues) in using LLMs. In the field experiment, we developed two prototype variations: one offering LLM-generated reviews upfront and the other enabling on-demand interaction. Both utilize a semantic search pipeline based on retrieval-augmented generation to assemble relevant contextual information for the review, thereby tackling the uncovered challenges. Developers evaluated both variations in real-world settings: AI-led reviews are overall more preferred, while still being conditional on the reviewers' familiarity with the code base, as well as on the severity of the pull request.
Problem

Research questions and friction points this paper is trying to address.

Addressing time-consuming challenges in traditional code reviews
Exploring LLM-assisted tools to improve review efficiency and accuracy
Evaluating developer preferences for AI-led vs on-demand review workflows
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-assisted code review tool variations
Semantic search pipeline for contextual information
AI-led reviews preferred in real-world settings
F
Fannar Steinn Adhalsteinsson
WirelessCar Sweden AB, Gothenburg, Sweden; Chalmers University of Technology, Gothenburg, Sweden
B
Bjorn Borgar Magn'usson
WirelessCar Sweden AB, Gothenburg, Sweden; Chalmers University of Technology, Gothenburg, Sweden
M
Mislav Milicevic
WirelessCar Sweden AB, Gothenburg, Sweden
A
Adam Nirving Davidsson
WirelessCar Sweden AB, Gothenburg, Sweden
Chih-Hong Cheng
Chih-Hong Cheng
Carl von Ossietzky University of Oldenburg & Chalmers University of Technology
AI safetysoftware engineeringformal methods