How Do Agents Perform Code Optimization? An Empirical Study

📅 2025-12-25

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Prior work lacks empirical evidence comparing AI coding agents and human developers on real-world performance optimization tasks. Method: We conduct the first large-scale empirical study using the AIDev dataset, analyzing 324 AI agent pull requests (PRs) and 83 human PRs via chi-square tests, PR metadata mining, code change pattern classification, and manual annotation. Contribution/Results: (1) AI agents exhibit significantly lower performance verification rates than humans (45.7% vs. 63.6%, *p* = 0.007), revealing a systemic verification deficit; (2) their optimization strategies closely align with human patterns, indicating reliability at the strategic level. This work establishes the first empirically grounded benchmark for AI agents in performance optimization, identifies “verification gap” as the critical trust bottleneck, and provides both theoretical foundations and practical pathways toward building verifiable, trustworthy AI code optimizers.

Technology Category

Application Category

📝 Abstract

Performance optimization is a critical yet challenging aspect of software development, often requiring a deep understanding of system behavior, algorithmic tradeoffs, and careful code modifications. Although recent advances in AI coding agents have accelerated code generation and bug fixing, little is known about how these agents perform on real-world performance optimization tasks. We present the first empirical study comparing agent- and human-authored performance optimization commits, analyzing 324 agent-generated and 83 human-authored PRs from the AIDev dataset across adoption, maintainability, optimization patterns, and validation practices. We find that AI-authored performance PRs are less likely to include explicit performance validation than human-authored PRs (45.7% vs. 63.6%, $p=0.007$). In addition, AI-authored PRs largely use the same optimization patterns as humans. We further discuss limitations and opportunities for advancing agentic code optimization.

Problem

Research questions and friction points this paper is trying to address.

Evaluates AI agents' real-world performance optimization capabilities

Compares AI and human code optimization patterns and validation

Identifies gaps in AI agents' performance validation practices

Innovation

Methods, ideas, or system contributions that make the work stand out.

Empirical study compares AI and human optimization commits

Analyzes adoption, maintainability, patterns, and validation practices

AI PRs show less explicit performance validation than humans

🔎 Similar Papers

No similar papers found.

Authors to Follow