🤖 AI Summary
This study investigates the impact of instructions on the effectiveness of agentic code submissions (Agentic PRs). Leveraging the AIDev dataset, the authors conduct a before-and-after analysis of 15,549 agentic pull requests across 148 projects, quantitatively evaluating instruction efficacy through metrics such as merge rate, task complexity, and reviewer interactions. The work reveals, for the first time, the dual-edged nature of instructions: while 27.7% of projects experienced a merge rate increase exceeding 20%, 26.35% saw a decline. Effective instructions tend to be longer and exhibit clearer structure. Building on these findings, the paper proposes a novel “Instructions-as-Code” paradigm, advocating for the integration of instruction authoring into standard software engineering practices.
📝 Abstract
AI-agents (e.g., GitHub Copilot) collaborate as teammates in different software engineering tasks, including code generation proposed through pull requests (Agentic-PRs). For better agent efficiency, developers create instruction files that guide the AI-agents, including how to navigate the project, locate the right components, run tests, respect best practices, and more. In this paper, we investigate the relationship between the creation of these instructions and the performance of AI-agents in creating better pull requests, which have a higher chance of success (i.e., the merge rate), address more complex tasks (e.g., code churn), and require less effort to be merged (e.g., time to merge). To this end, we analyze 15,549 agentic PRs from 148 projects in the AIDev dataset. Using the three dimensions, we compare each project before and after the creation of the instruction files. We find that specifying instructions for AI-agents does not necessarily lead to better results. With the instruction files, 27.7\% of the projects increased their merge rate by at least 20\%, while 26.35\% decreased it. The same observation is seen with the amount of changes (e.g., code churn, number of modified files) and with the efforts to merge an agentic PR (e.g., merge time and number of comments). From a first exploration, we find that projects that managed to increase their merge rate have substantially longer instruction files, which are also well structured into a higher number of sections and sub-sections. Our results motivate the need for research to assist practitioners in framing the development of instruction files as a software engineering activity (aka, \textbf{Instructions-as-Code}).