🤖 AI Summary
This study challenges the common assumption that refactoring inherently reduces software defects, investigating the long-term relationship between refactoring operations and code defects. Method: We conduct a longitudinal empirical analysis of 27,450 refactorings across 12 open-source projects, integrating static defect detection, fine-grained refactoring type classification, and context-aware mining of co-occurring changes. Contribution/Results: (1) Individual refactorings significantly increase short-term defect risk; (2) “Flossing refactorings”—those interleaved with non-refactoring changes—exhibit the highest defect proneness; (3) State-of-the-art behavior-preserving refactoring detection tools severely underreport most refactoring-induced defects. While refactoring overall demonstrates a net defect-suppressing effect, multiple or composite refactorings yield no cumulative benefit. This work is the first to quantify the elevated risks associated with single and flossing refactorings and to expose critical blind spots in mainstream refactoring verification methodologies.
📝 Abstract
Refactoring is a common practice in software development, aimed at improving the internal code structure in order to make it easier to understand and modify. Consequently, it is often assumed that refactoring makes the code less prone to bugs. However, in practice, refactoring is a complex task and applied in different ways (e.g., various refactoring types, single vs. composite refactorings) and with a variety of purposes (e.g., root-canal vs. floss refactoring). Therefore, certain refactorings can inadvertently make the code more prone to bugs. Unfortunately, there is limited research in the literature on the long-term relationship between the different characteristics of refactorings and bugs. This paper presents a longitudinal study of 12 open source software projects, where 27,450 refactorings, 6,051 reported bugs, and 49,250 bugs detected with static analysis tools were analyzed. While our study confirms the common intuition that refactored code is less bug-prone than non-refactored code, we also extend or contradict existing body of knowledge in other ways. First, a code element that undergoes multiple refactorings is not less bug-prone than an element that undergoes a single refactoring. A single refactoring is the one not performed in conjunction with other refactorings in the same commit. Second, single refactorings often induce the occurrence of bugs across all analyzed projects. Third, code elements affected by refactorings made in conjunction with other non-refactoring changes in the same commit (i.e., floss refactorings) are often bug-prone. Finally, many of such bugs induced by refactoring cannot be revealed with state-of-the-art techniques for detecting behavior-preserving refactorings.