🤖 AI Summary
Existing LLM prompting methods struggle to efficiently explore large integration search spaces and lack repository-level contextual understanding, limiting their effectiveness in software engineering problem repair. Method: This paper proposes the first LLM agent framework specifically designed for repository-level problem repair, introducing the agent paradigm to this task. It enables end-to-end integrative reasoning through three coordinated modules—generation, pruning, and selection—and supports dynamic test-time expansion. Contribution/Results: The design overcomes dual limitations of conventional prompting approaches: scalability in search space exploration and capacity for global semantic modeling. Evaluated on the SWE-bench benchmark, our method achieves an average 10.22% improvement in Pass@1 and attains a score of 75.20%, ranking first on the Verified leaderboard. The implementation is publicly available.
📝 Abstract
Software issue resolution is a critical challenge in software engineering and has garnered increasing attention in recent years. With the rapid advancement of large language models (LLMs), substantial progress has been made in addressing real-world software engineering tasks. Recent studies have introduced ensemble reasoning techniques to enhance the performance of LLM-based issue resolution. However, existing prompting-based methods still face limitations in effectively exploring large ensemble spaces and lack the capacity for repository-level understanding, both of which constrain their overall effectiveness. In this paper, we propose Trae Agent, the first agent-based ensemble reasoning approach for repository-level issue resolution. Trae Agent formulates our goal as an optimal solution search problem and addresses two key challenges, i.e., large ensemble spaces and repository-level understanding, through modular agents for generation, pruning, and selection. We conduct extensive experiments using three leading LLMs on the widely-adopted SWE-bench benchmark, comparing Trae Agent against four state-of-the-art ensemble reasoning techniques. Experimental results demonstrate that Trae Agent consistently achieves superior performance, with an average improvement of 10.22% over all baselines in terms of Pass@1. Trae Agent has achieved first place on the SWE-bench Verified leaderboard, with a notable Pass@1 score of 75.20%. We are pleased to release Trae Agent as an open-source project to support the research community, with all resources available at https://github.com/bytedance/trae-agent.