How Much Can a Behavior-Preserving Changeset Be Decomposed into Refactoring Operations?

📅 2025-08-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of identifying behavior-preserving modifications—such as refactorings—within mixed-code changes, a task poorly supported by existing tools. We conduct the first quantitative evaluation of mainstream refactoring detection tools on real-world behavior-preserving changes, revealing only 33.9% coverage. To systematically identify and classify such changes, we propose a function-equivalence-based analysis framework integrating automated detection with manual annotation. Crucially, we introduce 67 fine-grained equivalence-preserving operations, significantly improving decomposition coverage by over 128%. Our findings expose fundamental limitations in current tools’ semantic equivalence modeling, particularly their inability to capture subtle behavioral invariance across syntactically divergent code variants. The resulting benchmark dataset and methodology provide a rigorous foundation for advancing refactoring detection, change understanding, and the separation of mixed changes—offering both empirical insights and a scalable, extensible analytical approach.

Technology Category

Application Category

📝 Abstract
Developers sometimes mix behavior-preserving modifications, such as refactorings, with behavior-altering modifications, such as feature additions. Several approaches have been proposed to support understanding such modifications by separating them into those two parts. Such refactoring-aware approaches are expected to be particularly effective when the behavior-preserving parts can be decomposed into a sequence of more primitive behavior-preserving operations, such as refactorings, but this has not been explored. In this paper, as an initial validation, we quantify how much of the behavior-preserving modifications can be decomposed into refactoring operations using a dataset of functionally-equivalent method pairs. As a result, when using an existing refactoring detector, only 33.9% of the changes could be identified as refactoring operations. In contrast, when including 67 newly defined functionally-equivalent operations, the coverage increased by over 128%. Further investigation into the remaining unexplained differences was conducted, suggesting improvement opportunities.
Problem

Research questions and friction points this paper is trying to address.

Quantify decomposition of behavior-preserving changes into refactorings
Evaluate coverage of existing refactoring detection methods
Identify improvement opportunities for unexplained behavior-preserving changes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decompose changesets into refactoring operations
Use functionally-equivalent method pairs dataset
Define new functionally-equivalent operations
K
Kota Someya
School of Computing, Institute of Science Tokyo, Tokyo, Japan
L
Lei Chen
School of Computing, Institute of Science Tokyo, Tokyo, Japan
M
Michael J. Decker
Department of Computer Science, Bowling Green State University, Bowling Green, OH, USA
Shinpei Hayashi
Shinpei Hayashi
Institute of Science Tokyo
Software EngineeringRefactoringSoftware EvolutionSoftware Maintenance