🤖 AI Summary
To address the low efficiency and poor scalability of manual test case and script authoring in protocol conformance testing, this paper proposes the first end-to-end automated testing framework. Our method leverages large language models (LLMs) to interpret protocol specifications, integrates keyword-driven test case generation with retrieval-augmented generation (RAG)-based test script synthesis, and incorporates an iterative self-correction mechanism guided by test feedback for code optimization. Crucially, LLMs are deeply embedded throughout the entire testing pipeline—departing from conventional approaches reliant on manual effort or rigid rule-based templates. Experimental evaluation across diverse protocol families demonstrates that our framework achieves a 4.68–10.75× improvement in test code generation success rate (Pass@1) over pure LLM baselines. The results confirm substantial gains in automation level, accuracy, and generalizability.
📝 Abstract
Conformance testing is essential for ensuring that protocol implementations comply with their specifications. However, traditional testing approaches involve manually creating numerous test cases and scripts, making the process labor-intensive and inefficient. Recently, Large Language Models (LLMs) have demonstrated impressive text comprehension and code generation abilities, providing promising opportunities for automation. In this paper, we propose iPanda, the first end-to-end framework that leverages LLMs to automate protocol conformance testing. Given a protocol specification document and its implementation, iPanda first employs a keyword-based method to automatically generate comprehensive test cases. Then, it utilizes a code-based retrieval-augmented generation approach to effectively interpret the implementation and produce executable test code. To further enhance code quality, iPanda incorporates an iterative self-correction mechanism to refine generated test scripts interactively. Finally, by executing and analyzing the generated tests, iPanda systematically verifies compliance between implementations and protocol specifications. Comprehensive experiments on various protocols show that iPanda significantly outperforms pure LLM-based approaches, improving the success rate (Pass@1) of test-code generation by factors ranging from 4.675 times to 10.751 times.