๐ค AI Summary
To address inefficient test selection in continuous integration (CI) caused by growing code and test suite sizes, this paper proposes a coverage-agnostic, change-driven test selection method. Our approach models changed files as bag-of-words representations and integrates cross-file structural features with defect-proneness indicators to train a lightweight, scalable machine learning model. Crucially, it eliminates reliance on historical code coverage data, enabling end-to-end, commit-based test recommendation. Evaluated on a real-world industrial dataset, the method executes only 15% of the test suite, reduces test execution time by 5.9ร, and accelerates the overall CI pipeline by 5.6ร, while maintaining a 95.2% failure-detection rate. This significantly improves developer feedback latency and defect interception capability.
๐ Abstract
In modern software development change-based testing plays a crucial role. However, as codebases expand and test suites grow, efficiently managing the testing process becomes increasingly challenging, especially given the high frequency of daily code commits. We propose Targeted Test Selection (T-TS), a machine learning approach for industrial test selection. Our key innovation is a data representation that represent commits as Bags-of-Words of changed files, incorporates cross-file and additional predictive features, and notably avoids the use of coverage maps. Deployed in production, T-TS was comprehensively evaluated against industry standards and recent methods using both internal and public datasets, measuring time efficiency and fault detection. On live industrial data, T-TS selects only 15% of tests, reduces execution time by $5.9 imes$, accelerates the pipeline by $5.6 imes$, and detects over 95% of test failures. The implementation is publicly available to support further research and practical adoption.