🤖 AI Summary
Existing unit test maintenance suffers from low efficiency: automated approaches predominantly focus on repairing broken tests while neglecting test augmentation for new functionality, and rely on rule-based context extraction without validation mechanisms—leading to poor correctness. This paper introduces TESTUPDATER, the first context-aware, LLM-driven framework that unifies test repair and augmentation through a single modeling paradigm. It features an error-type-aware iterative refinement mechanism and a customized prompting strategy. Additionally, we construct UPDATES4J, a new benchmark comprising 195 real-world code-change scenarios. Experimental results show that TESTUPDATER achieves a 94.4% compilation success rate and an 86.7% test pass rate—outperforming the state-of-the-art baseline SYNTER by 15.9% and 20.0%, respectively—and significantly improves code coverage.
📝 Abstract
Unit testing is critical for ensuring software quality and software system stability. The current practice of manually maintaining unit tests suffers from low efficiency and the risk of delayed or overlooked fixes. Therefore, an automated approach is required to instantly update unit tests, with the capability to both repair and enhance unit tests. However, existing automated test maintenance methods primarily focus on repairing broken tests, neglecting the scenario of enhancing existing tests to verify new functionality. Meanwhile, due to their reliance on rule-based context collection and the lack of verification mechanisms, existing approaches struggle to handle complex code changes and often produce test cases with low correctness. To address these challenges, we propose TESTUPDATER, a novel LLM based approach that enables automated just-in-time test updates in response to production code changes. TESTUPDATER first leverages the LLM to analyze code changes and identify relevant context, which it then extracts and filters. Then, through carefully designed prompts, TESTUPDATER guides the LLM step by step to handle various types of code changes and introduce new dependencies, enabling both test repair and enhancement. Finally, we introduce an error-type-aware iterative refinement mechanism that executes the LLM-updated tests and repairs failures, which significantly improves the overall correctness of test updates. Since existing test repair datasets lack scenarios of test enhancement, we further construct a new benchmark, UPDATES4J, with 195 real-world samples from 7 projects. Experimental results show that TESTUPDATER achieves a compilation pass rate of 94.4% and a test pass rate of 86.7%, outperforming the state-of-the-art method SYNTER by 15.9% and 20.0%, respectively. Furthermore, TESTUPDATER exhibits 12.9% higher branch coverage and 15.2% greater line coverage than SYNTER.