🤖 AI Summary
This study addresses the persistent occurrence of software defects after release, particularly in C/C++ and Java systems, whose underlying causes remain poorly understood. Through a large-scale empirical analysis of over 14,000 open-source projects, the work systematically compares pre-release and post-release defect characteristics using multidimensional metrics—including code complexity, size, change frequency, and development history—and employs statistical modeling to uncover key patterns. It reveals for the first time that post-release defects are significantly concentrated in legacy modules that undergo frequent modifications, with their root causes primarily stemming from dynamic evolutionary pressures rather than static code structure. Furthermore, such defects exhibit longer repair cycles and higher complexity, offering empirical grounding for targeted testing strategies and improved reliability assurance.
📝 Abstract
Understanding how software defects manifest and evolve in production environments is critical for improving reliability. While previous research has largely focused on pre-release defects, the nature of residual faults, i.e., those escaping testing and surfacing post-release, remains poorly understood. This paper presents a large-scale characterization of pre- and post-release defects across C/C++ and Java systems, encompassing over 14k defects mined from open-source projects. We employ a broad suite of software metrics to capture diverse code attributes such as complexity, size, structure, and development history.
Results show that post-release defects are concentrated in older, frequently modified, and high-churn components, typically requiring longer and more complex fixes than pre-release ones. These findings highlight that residual defects arise more from evolutionary and process dynamics than code structure alone, suggesting that reliability engineering should prioritize targeted testing in mature and complex code regions.