Reformulate, Retrieve, Localize: Agents for Repository-Level Bug Localization

📅 2025-12-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address low defect localization accuracy in large-scale software repositories—where conventional information retrieval–based methods suffer from noise in bug reports—this paper proposes an LLM-driven agent framework that synergistically optimizes the BM25 retrieval pipeline via lightweight query reformulation and key summary extraction. Our approach requires no fine-tuning of open-source large language models, yet achieves semantic-enriched query rewriting and context-aware information distillation. It constructs an end-to-end automated localization pipeline supporting repository-level, fine-grained file localization. Experimental results demonstrate a 35% improvement in Mean Reciprocal Rank (MRR) over the BM25 baseline and up to a 22% higher Top-10 file recall rate compared to SWE-agent. The method significantly enhances both localization efficiency and robustness against noisy defect reports.

Technology Category

Application Category

📝 Abstract
Bug localization remains a critical yet time-consuming challenge in large-scale software repositories. Traditional information retrieval-based bug localization (IRBL) methods rely on unchanged bug descriptions, which often contain noisy information, leading to poor retrieval accuracy. Recent advances in large language models (LLMs) have improved bug localization through query reformulation, yet the effect on agent performance remains unexplored. In this study, we investigate how an LLM-powered agent can improve file-level bug localization via lightweight query reformulation and summarization. We first employ an open-source, non-fine-tuned LLM to extract key information from bug reports, such as identifiers and code snippets, and reformulate queries pre-retrieval. Our agent then orchestrates BM25 retrieval using these preprocessed queries, automating localization workflow at scale. Using the best-performing query reformulation technique, our agent achieves 35% better ranking in first-file retrieval than our BM25 baseline and up to +22% file retrieval performance over SWE-agent.
Problem

Research questions and friction points this paper is trying to address.

Improves bug localization via LLM query reformulation
Automates retrieval workflow for large software repositories
Enhances file-level accuracy over traditional IRBL methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM extracts key bug report details for query reformulation
Agent orchestrates BM25 retrieval with preprocessed queries
Lightweight reformulation improves first-file retrieval ranking by 35%
🔎 Similar Papers
No similar papers found.