Where's the Bug? Attention Probing for Scalable Fault Localization

📅 2025-02-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of precise fault localization (FL) in code by large language models (LLMs) under zero-execution-environment and zero-annotated-data conditions, this paper proposes BAP—the first attention-based self-supervised probe for FL. Methodologically, BAP models defect-sensitive patterns via attention distribution, leveraging self-supervised contrastive learning, lightweight model distillation, and multi-language bug data generalization—eliminating reliance on line-level labels or runtime feedback. Evaluated on eight benchmark datasets, BAP achieves an average 34.6% improvement in top-1 localization accuracy over the strongest baseline and significantly outperforms GPT-4o zero-shot prompting by 93.4%, while reducing computational overhead by one to two orders of magnitude. This work establishes, for the first time, the feasibility and superiority of a purely attention-driven, self-supervised FL paradigm.

Technology Category

Application Category

📝 Abstract
Ensuring code correctness remains a challenging problem even as large language models (LLMs) become increasingly capable at code-related tasks. While LLM-based program repair systems can propose bug fixes using only a user's bug report, their effectiveness is fundamentally limited by their ability to perform fault localization (FL), a challenging problem for both humans and LLMs. Existing FL approaches rely on executable test cases, require training on costly and often noisy line-level annotations, or demand resource-intensive LLMs. In this paper, we present Bug Attention Probe (BAP), a method which learns state-of-the-art fault localization without any direct localization labels, outperforming traditional FL baselines and prompting of large-scale LLMs. We evaluate our approach across a variety of code settings, including real-world Java bugs from the standard Defects4J dataset as well as seven other datasets which span a diverse set of bug types and languages. Averaged across all eight datasets, BAP improves by 34.6% top-1 accuracy compared to the strongest baseline and 93.4% over zero-shot prompting GPT-4o. BAP is also significantly more efficient than prompting, outperforming large open-weight models at a small fraction of the computational cost.
Problem

Research questions and friction points this paper is trying to address.

Existing fault localization methods rely on test cases, costly annotations, or resource-heavy LLMs.
Proposing Bug Attention Probe for scalable FL without direct localization labels.
BAP improves accuracy and efficiency across diverse code datasets and languages.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Learns fault localization without direct labels
Outperforms baselines across diverse code datasets
Efficiently surpasses large models computationally
🔎 Similar Papers
No similar papers found.