How Much Can a Behavior-Preserving Changeset Be Decomposed into Refactoring Operations?

📅 2025-08-16

📈 Citations: 0

✨ Influential: 0

career value

148K/year

🤖 AI Summary

This work addresses the challenge of identifying behavior-preserving modifications—such as refactorings—within mixed-code changes, a task poorly supported by existing tools. We conduct the first quantitative evaluation of mainstream refactoring detection tools on real-world behavior-preserving changes, revealing only 33.9% coverage. To systematically identify and classify such changes, we propose a function-equivalence-based analysis framework integrating automated detection with manual annotation. Crucially, we introduce 67 fine-grained equivalence-preserving operations, significantly improving decomposition coverage by over 128%. Our findings expose fundamental limitations in current tools’ semantic equivalence modeling, particularly their inability to capture subtle behavioral invariance across syntactically divergent code variants. The resulting benchmark dataset and methodology provide a rigorous foundation for advancing refactoring detection, change understanding, and the separation of mixed changes—offering both empirical insights and a scalable, extensible analytical approach.

Technology Category

Application Category

📝 Abstract

Developers sometimes mix behavior-preserving modifications, such as refactorings, with behavior-altering modifications, such as feature additions. Several approaches have been proposed to support understanding such modifications by separating them into those two parts. Such refactoring-aware approaches are expected to be particularly effective when the behavior-preserving parts can be decomposed into a sequence of more primitive behavior-preserving operations, such as refactorings, but this has not been explored. In this paper, as an initial validation, we quantify how much of the behavior-preserving modifications can be decomposed into refactoring operations using a dataset of functionally-equivalent method pairs. As a result, when using an existing refactoring detector, only 33.9% of the changes could be identified as refactoring operations. In contrast, when including 67 newly defined functionally-equivalent operations, the coverage increased by over 128%. Further investigation into the remaining unexplained differences was conducted, suggesting improvement opportunities.

Problem

Research questions and friction points this paper is trying to address.

Quantify decomposition of behavior-preserving changes into refactorings

Evaluate coverage of existing refactoring detection methods

Identify improvement opportunities for unexplained behavior-preserving changes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decompose changesets into refactoring operations

Use functionally-equivalent method pairs dataset

Define new functionally-equivalent operations

🔎 Similar Papers

Deciphering Refactoring Branch Dynamics in Modern Code Review: An Empirical Study on Qt