More Code, Less Reuse: Investigating Code Quality and Reviewer Sentiment towards AI-generated Pull Requests

📅 2026-01-29

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This study addresses the limitations of prevailing code evaluation metrics, which predominantly emphasize functional correctness while overlooking the long-term implications of AI-generated code on maintainability, reusability, and developers’ subjective perceptions. To bridge this gap, the authors propose an integrated assessment framework that combines static code analysis with sentiment analysis of code review comments. Their findings reveal, for the first time, that pull requests generated by large language models (LLMs), though superficially plausible, exhibit higher redundancy and reduced module reuse—thereby subtly accumulating technical debt. Paradoxically, human reviewers tend to express neutral or even positive sentiments toward such code, highlighting a significant misalignment between objective code quality and subjective perception. This work thus offers a novel perspective and empirical foundation for the holistic evaluation of AI-generated code.

Technology Category

Application Category

📝 Abstract

Large Language Model (LLM) Agents are advancing quickly, with the increasing leveraging of LLM Agents to assist in development tasks such as code generation. While LLM Agents accelerate code generation, studies indicate they may introduce adverse effects on development. However, existing metrics solely measure pass rates, failing to reflect impacts on long-term maintainability and readability, and failing to capture human intuitive evaluations of PR. To increase the comprehensiveness of this problem, we investigate and evaluate the characteristics of LLM to know the pull requests'characteristics beyond the pass rate. We observe the code quality and maintainability within PRs based on code metrics to evaluate objective characteristics and developers'reactions to the pull requests from both humans and LLM's generation. Evaluation results indicate that LLM Agents frequently disregard code reuse opportunities, resulting in higher levels of redundancy compared to human developers. In contrast to the quality issues, our emotions analysis reveals that reviewers tend to express more neutral or positive emotions towards AI-generated contributions than human ones. This disconnect suggests that the surface-level plausibility of AI code masks redundancy, leading to the silent accumulation of technical debt in real-world development environments. Our research provides insights for improving human-AI collaboration.

Problem

Research questions and friction points this paper is trying to address.

code quality

technical debt

AI-generated code

code reuse

pull requests

Innovation

Methods, ideas, or system contributions that make the work stand out.

code quality

technical debt

LLM-generated code