Assessing the Bug-Proneness of Refactored Code: A Longitudinal Multi-Project Study

📅 2025-05-12

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

This study challenges the common assumption that refactoring inherently reduces software defects, investigating the long-term relationship between refactoring operations and code defects. Method: We conduct a longitudinal empirical analysis of 27,450 refactorings across 12 open-source projects, integrating static defect detection, fine-grained refactoring type classification, and context-aware mining of co-occurring changes. Contribution/Results: (1) Individual refactorings significantly increase short-term defect risk; (2) “Flossing refactorings”—those interleaved with non-refactoring changes—exhibit the highest defect proneness; (3) State-of-the-art behavior-preserving refactoring detection tools severely underreport most refactoring-induced defects. While refactoring overall demonstrates a net defect-suppressing effect, multiple or composite refactorings yield no cumulative benefit. This work is the first to quantify the elevated risks associated with single and flossing refactorings and to expose critical blind spots in mainstream refactoring verification methodologies.

Technology Category

Application Category

📝 Abstract

Refactoring is a common practice in software development, aimed at improving the internal code structure in order to make it easier to understand and modify. Consequently, it is often assumed that refactoring makes the code less prone to bugs. However, in practice, refactoring is a complex task and applied in different ways (e.g., various refactoring types, single vs. composite refactorings) and with a variety of purposes (e.g., root-canal vs. floss refactoring). Therefore, certain refactorings can inadvertently make the code more prone to bugs. Unfortunately, there is limited research in the literature on the long-term relationship between the different characteristics of refactorings and bugs. This paper presents a longitudinal study of 12 open source software projects, where 27,450 refactorings, 6,051 reported bugs, and 49,250 bugs detected with static analysis tools were analyzed. While our study confirms the common intuition that refactored code is less bug-prone than non-refactored code, we also extend or contradict existing body of knowledge in other ways. First, a code element that undergoes multiple refactorings is not less bug-prone than an element that undergoes a single refactoring. A single refactoring is the one not performed in conjunction with other refactorings in the same commit. Second, single refactorings often induce the occurrence of bugs across all analyzed projects. Third, code elements affected by refactorings made in conjunction with other non-refactoring changes in the same commit (i.e., floss refactorings) are often bug-prone. Finally, many of such bugs induced by refactoring cannot be revealed with state-of-the-art techniques for detecting behavior-preserving refactorings.

Problem

Research questions and friction points this paper is trying to address.

Examines long-term impact of refactoring on code bug-proneness

Assesses bug risks of single vs. composite refactoring types

Identifies limitations in detecting refactoring-induced bugs with current techniques

Innovation

Methods, ideas, or system contributions that make the work stand out.

Longitudinal study of 12 open source projects

Analyzed 27,450 refactorings and 55,301 bugs

Evaluated bug-proneness of different refactoring types

🔎 Similar Papers

Deciphering Refactoring Branch Dynamics in Modern Code Review: An Empirical Study on Qt