Combining TSL and LLM to Automate REST API Testing: A Comparative Study

📅 2025-09-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
REST API testing faces challenges including high distributed-system complexity, diverse usage scenarios, substantial manual effort for test design, and insufficient coverage. This paper proposes an automated test case generation method integrating a Test Specification Language (TSL) with large language models (LLMs), leveraging customized prompt engineering and an end-to-end automation pipeline to efficiently transform OpenAPI specifications into executable test cases. Its key contribution is the novel use of TSL as a structured intermediate representation that bridges LLM-based reasoning and formal test semantics, thereby enhancing generation accuracy and verifiability. Experimental evaluation demonstrates that Claude 3.5 Sonnet significantly outperforms other mainstream LLMs in test generation success rate, API path coverage, and mutation score—achieving an average 37% improvement in coverage and reducing manual authoring effort by approximately 62%.

Technology Category

Application Category

📝 Abstract
The effective execution of tests for REST APIs remains a considerable challenge for development teams, driven by the inherent complexity of distributed systems, the multitude of possible scenarios, and the limited time available for test design. Exhaustive testing of all input combinations is impractical, often resulting in undetected failures, high manual effort, and limited test coverage. To address these issues, we introduce RestTSLLM, an approach that uses Test Specification Language (TSL) in conjunction with Large Language Models (LLMs) to automate the generation of test cases for REST APIs. The approach targets two core challenges: the creation of test scenarios and the definition of appropriate input data. The proposed solution integrates prompt engineering techniques with an automated pipeline to evaluate various LLMs on their ability to generate tests from OpenAPI specifications. The evaluation focused on metrics such as success rate, test coverage, and mutation score, enabling a systematic comparison of model performance. The results indicate that the best-performing LLMs - Claude 3.5 Sonnet (Anthropic), Deepseek R1 (Deepseek), Qwen 2.5 32b (Alibaba), and Sabia 3 (Maritaca) - consistently produced robust and contextually coherent REST API tests. Among them, Claude 3.5 Sonnet outperformed all other models across every metric, emerging in this study as the most suitable model for this task. These findings highlight the potential of LLMs to automate the generation of tests based on API specifications.
Problem

Research questions and friction points this paper is trying to address.

Automating REST API test case generation
Addressing test scenario creation challenges
Overcoming exhaustive input combination testing limitations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combining TSL and LLM for REST API test automation
Using prompt engineering with automated evaluation pipeline
Generating tests from OpenAPI specifications via LLMs
🔎 Similar Papers
No similar papers found.
T
Thiago Barradas
Universidade Federal Fluminense, Niterói, RJ, Brazil
Aline Paes
Aline Paes
Universidade Federal Fluminense
Artificial IntelligenceMachine LearningNatural Language ProcessingRelational Learning
V
Vânia de Oliveira Neves
Universidade Federal Fluminense, Niterói, RJ, Brazil