More Code, Less Reuse: Investigating Code Quality and Reviewer Sentiment towards AI-generated Pull Requests

📅 2026-01-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the limitations of prevailing code evaluation metrics, which predominantly emphasize functional correctness while overlooking the long-term implications of AI-generated code on maintainability, reusability, and developers’ subjective perceptions. To bridge this gap, the authors propose an integrated assessment framework that combines static code analysis with sentiment analysis of code review comments. Their findings reveal, for the first time, that pull requests generated by large language models (LLMs), though superficially plausible, exhibit higher redundancy and reduced module reuse—thereby subtly accumulating technical debt. Paradoxically, human reviewers tend to express neutral or even positive sentiments toward such code, highlighting a significant misalignment between objective code quality and subjective perception. This work thus offers a novel perspective and empirical foundation for the holistic evaluation of AI-generated code.

Technology Category

Application Category

📝 Abstract
Large Language Model (LLM) Agents are advancing quickly, with the increasing leveraging of LLM Agents to assist in development tasks such as code generation. While LLM Agents accelerate code generation, studies indicate they may introduce adverse effects on development. However, existing metrics solely measure pass rates, failing to reflect impacts on long-term maintainability and readability, and failing to capture human intuitive evaluations of PR. To increase the comprehensiveness of this problem, we investigate and evaluate the characteristics of LLM to know the pull requests'characteristics beyond the pass rate. We observe the code quality and maintainability within PRs based on code metrics to evaluate objective characteristics and developers'reactions to the pull requests from both humans and LLM's generation. Evaluation results indicate that LLM Agents frequently disregard code reuse opportunities, resulting in higher levels of redundancy compared to human developers. In contrast to the quality issues, our emotions analysis reveals that reviewers tend to express more neutral or positive emotions towards AI-generated contributions than human ones. This disconnect suggests that the surface-level plausibility of AI code masks redundancy, leading to the silent accumulation of technical debt in real-world development environments. Our research provides insights for improving human-AI collaboration.
Problem

Research questions and friction points this paper is trying to address.

code quality
technical debt
AI-generated code
code reuse
pull requests
Innovation

Methods, ideas, or system contributions that make the work stand out.

code quality
technical debt
LLM-generated code
code reuse
reviewer sentiment
🔎 Similar Papers
No similar papers found.