🤖 AI Summary
Existing automated program repair approaches struggle to effectively address advanced defects such as implicit type degradation and complex polymorphic control flows due to their reliance on static analysis and shallow execution feedback. To overcome these limitations, this work proposes DAIRA, a novel framework that natively integrates dynamic analysis into the reasoning loop of large language model agents. By leveraging lightweight runtime monitoring and test trace-driven mechanisms, DAIRA extracts execution-time data—including variable mutations and call stacks—to generate structured semantic reports that guide agents toward evidence-driven, deterministic repairs. Evaluated on SWE-bench Verified, DAIRA achieves a 79.4% issue resolution rate, substantially outperforming baseline methods by repairing more complex bugs while simultaneously reducing inference overhead by approximately 10% and input token consumption by 25%.
📝 Abstract
Translating natural language descriptions into viable code fixes remains a fundamental challenge in software engineering. While the proliferation of agentic large language models (LLMs) has vastly improved automated repository-level debugging, current frameworks hit a ceiling when dealing with sophisticated bugs like implicit type degradations and complex polymorphic control flows. Because these methods rely heavily on static analysis and superficial execution feedback, they lack visibility into intermediate runtime states. Consequently, agents are forced into costly, speculative trial-and-error loops, wasting computational tokens without successfully isolating the root cause.
To bridge this gap, we propose DAIRA (Dynamic Analysis-enhanced Issue Resolution Agent), a pioneering automated repair framework that natively embeds dynamic analysis into the agent's reasoning cycle. Driven by a Test Tracing-Driven methodology, DAIRA utilizes lightweight monitors to extract critical runtime data -- such as variable mutations and call stacks -- and synthesizes them into structured semantic reports. This mechanism fundamentally shifts the agent's behavior from blind guesswork to evidence-based, deterministic deduction. When powered by Gemini 3 Flash Preview, DAIRA establishes a new state-of-the-art (SOTA) performance, achieving a 79.4% resolution rate on the SWE-bench Verified dataset. Compared to existing baselines, our framework not only conquers highly complex defects but also cuts overall inference expenses by roughly 10% and decreases input token consumption by approximately 25%.