A Purpose-oriented Study on Open-source Software Commits and Their Impacts on Software Quality

📅 2025-03-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the poorly understood relationship between commit intent and software quality in open-source projects. We propose the first fine-grained commit classification framework grounded in developer intent. Leveraging natural language processing, we semantically parse commit messages and develop a supervised classifier via BERT fine-tuning. A large-scale empirical analysis across 127 mainstream open-source projects reveals statistically significant causal associations between commit purposes and key quality metrics—including defect introduction rate and performance degradation. Our model achieves an F1-score of 89.3%. Based on these findings, we distill six actionable, evidence-based best practices for high-quality commit authoring. This work contributes a novel conceptual lens, a robust methodological approach, and practical guidelines for improving open-source software quality through intent-aware development practices.

Technology Category

Application Category

📝 Abstract
Developing software with the source code open to the public is prevalent; however, similar to its closed counter part, open-source has quality problems, which cause functional failures, such as program breakdowns, and non-functional, such as long response times. Previous researchers have revealed when, where, how and what developers contribute to projects and how these aspects impact software quality. However, there has been little work on how different categories of commits impact software quality. To improve open-source software, we conducted this preliminary study to categorize commits, train prediction models to automate the classification, and investigate how commit quality is impacted by commits of different purposes. By identifying these impacts, we will establish a new set of guidelines for committing changes that will improve the quality.
Problem

Research questions and friction points this paper is trying to address.

Categorize open-source software commits and their quality impacts.
Train models to automate commit classification for quality prediction.
Establish guidelines to improve software quality through purposeful commits.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Categorize commits to analyze quality impacts
Train models for automated commit classification
Establish guidelines for quality-improving commits
🔎 Similar Papers
No similar papers found.
Jincheng He
Jincheng He
University of Southern California
Software EngineeringData Mining
Z
Zhongheng He
Department of Computer Science, University