An Empirical Study on the Impact of Code Duplication-aware Refactoring Practices on Quality Metrics

📅 2025-02-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the mapping between refactoring operations performed to eliminate code duplication and software design quality metrics, along with their empirical impact. Leveraging 332 manually labeled deduplication-refactoring commits from 128 open-source Java projects, we integrate code mining, extraction of 32 structural metrics, Wilcoxon signed-rank tests, and commit semantic analysis. To our knowledge, this is the first systematic empirical validation of how widely adopted quality metrics respond to duplication-removal intent. Results show that most metrics capture this intent, yet effects are highly heterogeneous: cohesion and maintainability significantly improve, whereas complexity and coupling either remain unchanged or deteriorate. The findings expose critical limitations and contextual boundaries of conventional quality metrics in refactoring scenarios, challenging assumptions about their universality. This work provides empirical grounding for refining quality models and assessing refactoring effectiveness in practice.

Technology Category

Application Category

📝 Abstract
Context: Code refactoring is widely recognized as an essential software engineering practice that improves the understandability and maintainability of source code. Several studies attempted to detect refactoring activities through mining software repositories, allowing one to collect, analyze, and get actionable data-driven insights about refactoring practices within software projects. Objective: Our goal is to identify, among the various quality models presented in the literature, the ones that align with the developer's vision of eliminating duplicates of code, when they explicitly mention that they refactor the code to improve them. Method: We extract a corpus of 332 refactoring commits applied and documented by developers during their daily changes from 128 open-source Java projects. In particular, we extract 32 structural metrics from which we identify code duplicate removal commits with their corresponding refactoring operations, as perceived by software engineers. Thereafter, we empirically analyze the impact of these refactoring operations on a set of common state-of-the-art design quality metrics. Results: The statistical analysis of the results obtained shows that (i) some state-of-the-art metrics are capable of capturing the developer's intention of removing code duplication; and (ii) some metrics are being more emphasized than others. We confirm that various structural metrics can effectively represent code duplication, leading to different impacts on software quality. Some metrics contribute to improvements, while others may lead to degradation. Conclusion: Most of the mapped metrics associated with the main quality attributes successfully capture developers' intentions for removing code duplicates, as is evident from the commit messages. However, certain metrics do not fully capture these intentions
Problem

Research questions and friction points this paper is trying to address.

Impact of code duplication refactoring on quality metrics
Alignment of quality models with developer intentions
Evaluation of 32 structural metrics in 128 Java projects
Innovation

Methods, ideas, or system contributions that make the work stand out.

Code refactoring for duplication removal
Structural metrics for quality assessment
Empirical analysis of refactoring impacts