🤖 AI Summary
Large language models (LLMs) frequently generate unit tests with compilation or runtime errors, exhibit low code coverage, and suffer from repetition suppression that undermines repair effectiveness. Method: This paper proposes an automated test enhancement framework based on co-evolutionary generation and repair. It introduces a novel templated repair strategy, the first to embed dynamic coverage feedback directly into the LLM generation loop, and employs forward prompt injection to mitigate repetition suppression. The framework integrates template-driven repair, coverage-guided feedback, iterative generate-and-repair cycles, and advanced prompt engineering. Results: Experiments show a 18% improvement in test pass rate and a 20% increase in line coverage over baseline methods. Moreover, it achieves higher coverage than EvoSuite using only 50% of its test cases, demonstrating significant gains in both generated test quality and efficiency.
📝 Abstract
Unit testing is crucial for detecting bugs in individual program units but consumes time and effort. Recently, large language models (LLMs) have demonstrated remarkable capabilities in generating unit test cases. However, several problems limit their ability to generate high-quality unit test cases: (1) compilation and runtime errors caused by the hallucination of LLMs; (2) lack of testing and coverage feedback information restricting the increase of code coverage;(3) the repetitive suppression problem causing invalid LLM-based repair and generation attempts. To address these limitations, we propose TestART, a novel unit test generation method. TestART improves LLM-based unit testing via co-evolution of automated generation and repair iteration, representing a significant advancement in automated unit test generation. TestART leverages the template-based repair strategy to effectively fix bugs in LLM-generated test cases for the first time. Meanwhile, TestART extracts coverage information from successful test cases and uses it as coverage-guided testing feedback. It also incorporates positive prompt injection to prevent repetition suppression, thereby enhancing the sufficiency of the final test case. This synergy between generation and repair elevates the correctness and sufficiency of the produced test cases significantly beyond previous methods. In comparative experiments, TestART demonstrates an 18% improvement in pass rate and a 20% enhancement in coverage across three types of datasets compared to baseline models. Additionally, it achieves better coverage rates than EvoSuite with only half the number of test cases. These results demonstrate TestART's superior ability to produce high-quality unit test cases by harnessing the power of LLMs while overcoming their inherent flaws.