๐ค AI Summary
Large language models (LLMs) frequently generate unit tests containing syntactic and semantic errors, resulting in high discard rates. To address this, this paper proposes the first systematic โtest repairโdrivenโ enhancement paradigm: it employs rule-based static analysis to detect errors and leverages re-prompting to guide LLMs in automatic test repair. This approach transforms initially invalid, erroneous tests into high-value test cases and high-quality generation seeds, thereby significantly improving subsequent test generation quality and efficiency. Experiments across six open-source projects show that our method achieves, on average, a 32.06% improvement in line coverage and a 21.77% gain in mutant kill rate over a pure-LLM baseline. Against four state-of-the-art baselines, it consistently outperforms them by approximately 20โ22% in both coverage and killing capability, while incurring comparable computational overhead.
๐ Abstract
Recent advances in automated test generation utilises language models to produce unit tests. While effective, language models tend to generate many incorrect tests with respect to both syntax and semantics. Although such incorrect tests can be easily detected and discarded, they constitute a "missed opportunity" -- if fixed, they are often valuable as they directly add testing value (they effectively target the underlying program logic to be tested) and indirectly form good seeds for generating additional tests. To this end, we propose a simple technique for repairing some of these incorrect tests through a combination of rule-based static analysis and re-prompting. We evaluate this simple approach, named YATE, on a set of 6 open-source projects and show that it can effectively produce tests that cover on average 32.06% more lines and kill 21.77% more mutants than a plain LLM-based method. We also compare YATE with four other LLM-based methods, namely HITS, SYMPROMPT, TESTSPARK and COVERUP and show that it produces tests that cover substantially more code. YATE achieves 22% higher line coverage, 20% higher branch coverage and kill 20% more mutants at a comparable cost (number of calls to LLMs).