The Rise of Agentic Testing: Multi-Agent Systems for Robust Software Quality Assurance

📅 2026-01-05
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing AI-based test generation approaches typically produce static, one-off outputs that often yield invalid, redundant, or non-executable test cases and lack mechanisms for execution feedback. This work proposes the first closed-loop, self-correcting multi-agent testing framework, in which three specialized agents—responsible for test generation, execution analysis, and review-based optimization—collaborate to enable feedback-driven iterative refinement. The framework innovatively integrates multi-agent collaboration with continuous learning, leveraging a sandboxed execution environment, fine-grained failure diagnostics, coverage-aware reinforcement signals, and a CI/CD-compatible pipeline to support automatic regeneration and repair of test cases. Evaluated on microservice applications, the approach reduces invalid tests by 60% and improves coverage by 30%, substantially decreasing the need for manual intervention.

Technology Category

Application Category

📝 Abstract
Software testing has progressed toward intelligent automation, yet current AI-based test generators still suffer from static, single-shot outputs that frequently produce invalid, redundant, or non-executable tests due to the lack of execution aware feedback. This paper introduces an agentic multi-model testing framework a closed-loop, self-correcting system in which a Test Generation Agent, an Execution and Analysis Agent, and a Review and Optimization Agent collaboratively generate, execute, analyze, and refine tests until convergence. By using sandboxed execution, detailed failure reporting, and iterative regeneration or patching of failing tests, the framework autonomously improves test quality and expands coverage. Integrated into a CI/CD-compatible pipeline, it leverages reinforcement signals from coverage metrics and execution outcomes to guide refinement. Empirical evaluations on microservice based applications show up to a 60% reduction in invalid tests, 30% coverage improvement, and significantly reduced human effort compared to single-model baselines demonstrating that multi-agent, feedback-driven loops can evolve software testing into an autonomous, continuously learning quality assurance ecosystem for self-healing, high-reliability codebases.
Problem

Research questions and friction points this paper is trying to address.

software testing
AI-based test generation
execution feedback
test validity
test coverage
Innovation

Methods, ideas, or system contributions that make the work stand out.

agentic testing
multi-agent system
self-correcting framework
execution-aware feedback
autonomous software testing
🔎 Similar Papers
No similar papers found.
S
Saba Naqvi
MUFG Bank
Mohammad Baqar
Mohammad Baqar
Software Engineer at Cisco Systems Inc
N
Nawaz Ali Mohammad
University of North Carolina