🤖 AI Summary
Neural program repair methods frequently fail due to model context-window limitations, which hinder access to distant identifiers—such as variables or function names spanning multiple functions or files. This work is the first to systematically identify the absence of distant identifiers as the primary cause of failure in mainstream neural repair approaches. To address this, we propose a novel paradigm termed *repair ingredient extraction*, establishing *ingredient scanning* as a critical subtask. We introduce ScanFix, a dual-model collaborative framework: a lightweight scanning model perceives contextual information from file- and project-level code and selects salient identifiers via semantic similarity, which are then injected into a repair language model. Evaluated on standard benchmarks, ScanFix achieves a 31% relative improvement in repair success rate. Moreover, injecting identifiers actually present in ground-truth fixes further boosts performance, empirically validating both the effectiveness and necessity of ingredient extraction and injection.
📝 Abstract
Deep learning and language models are increasingly dominating automated program repair research. While previous generate-and-validate approaches were able to find and use fix ingredients on a file or even project level, neural language models are limited to the code that fits their input window. In this work we investigate how important identifier ingredients are in neural program repair and present ScanFix, an approach that leverages an additional scanner model to extract identifiers from a bug's file and potentially project-level context. We find that lack of knowledge of far-away identifiers is an important cause of failed repairs. Augmenting repair model input with scanner-extracted identifiers yields relative improvements of up to 31%. However, ScanFix is outperformed by a model with a large input window (>5k tokens). When passing ingredients from the ground-truth fix, improvements are even higher. This shows that, with refined extraction techniques, ingredient scanning, similar to fix candidate ranking, could have the potential to become an important subtask of future automated repair systems. At the same time, it also demonstrates that this idea is subject to Sutton's bitter lesson and may be rendered unnecessary by new code models with ever-increasing context windows.