🤖 AI Summary
This study addresses the poorly understood relationship between commit intent and software quality in open-source projects. We propose the first fine-grained commit classification framework grounded in developer intent. Leveraging natural language processing, we semantically parse commit messages and develop a supervised classifier via BERT fine-tuning. A large-scale empirical analysis across 127 mainstream open-source projects reveals statistically significant causal associations between commit purposes and key quality metrics—including defect introduction rate and performance degradation. Our model achieves an F1-score of 89.3%. Based on these findings, we distill six actionable, evidence-based best practices for high-quality commit authoring. This work contributes a novel conceptual lens, a robust methodological approach, and practical guidelines for improving open-source software quality through intent-aware development practices.
📝 Abstract
Developing software with the source code open to the public is prevalent; however, similar to its closed counter part, open-source has quality problems, which cause functional failures, such as program breakdowns, and non-functional, such as long response times. Previous researchers have revealed when, where, how and what developers contribute to projects and how these aspects impact software quality. However, there has been little work on how different categories of commits impact software quality. To improve open-source software, we conducted this preliminary study to categorize commits, train prediction models to automate the classification, and investigate how commit quality is impacted by commits of different purposes. By identifying these impacts, we will establish a new set of guidelines for committing changes that will improve the quality.