YATE: The Role of Test Repair in LLM-Based Unit Test Generation

๐Ÿ“… 2025-07-24
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Large language models (LLMs) frequently generate unit tests containing syntactic and semantic errors, resulting in high discard rates. To address this, this paper proposes the first systematic โ€œtest repairโ€“drivenโ€ enhancement paradigm: it employs rule-based static analysis to detect errors and leverages re-prompting to guide LLMs in automatic test repair. This approach transforms initially invalid, erroneous tests into high-value test cases and high-quality generation seeds, thereby significantly improving subsequent test generation quality and efficiency. Experiments across six open-source projects show that our method achieves, on average, a 32.06% improvement in line coverage and a 21.77% gain in mutant kill rate over a pure-LLM baseline. Against four state-of-the-art baselines, it consistently outperforms them by approximately 20โ€“22% in both coverage and killing capability, while incurring comparable computational overhead.

Technology Category

Application Category

๐Ÿ“ Abstract
Recent advances in automated test generation utilises language models to produce unit tests. While effective, language models tend to generate many incorrect tests with respect to both syntax and semantics. Although such incorrect tests can be easily detected and discarded, they constitute a "missed opportunity" -- if fixed, they are often valuable as they directly add testing value (they effectively target the underlying program logic to be tested) and indirectly form good seeds for generating additional tests. To this end, we propose a simple technique for repairing some of these incorrect tests through a combination of rule-based static analysis and re-prompting. We evaluate this simple approach, named YATE, on a set of 6 open-source projects and show that it can effectively produce tests that cover on average 32.06% more lines and kill 21.77% more mutants than a plain LLM-based method. We also compare YATE with four other LLM-based methods, namely HITS, SYMPROMPT, TESTSPARK and COVERUP and show that it produces tests that cover substantially more code. YATE achieves 22% higher line coverage, 20% higher branch coverage and kill 20% more mutants at a comparable cost (number of calls to LLMs).
Problem

Research questions and friction points this paper is trying to address.

Repairing incorrect unit tests generated by LLMs
Improving test coverage and mutation killing efficiency
Comparing performance with other LLM-based test generation methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Repairs incorrect tests via static analysis
Uses re-prompting to fix semantic errors
Boosts coverage by 32% versus plain LLM
๐Ÿ”Ž Similar Papers
No similar papers found.