Evaluating Large Language Models for the Generation of Unit Tests with Equivalence Partitions and Boundary Values

📅 2025-05-14
📈 Citations: 0
Influential: 0
📄 PDF

career value

153K/year
🤖 AI Summary
This study investigates the capability of large language models (LLMs) to automatically generate unit test cases covering critical testing scenarios—specifically equivalence class partitioning and boundary value analysis—and compares their output against manually authored tests. Method: We propose a structured prompt engineering approach tailored to testing semantics, integrating code context and formal requirement descriptions to enhance LLMs’ comprehension of testing criteria. Evaluation employs a hybrid paradigm combining quantitative metrics (test coverage, pass rate) with expert-driven qualitative analysis. Contribution/Results: Experiments demonstrate that high-quality prompts, precise requirement specifications, and robust code implementations are pivotal for effective test generation. While LLMs can produce functionally valid test cases, they remain insufficient to replace human testers—requiring careful supervision and correction, especially concerning boundary logic and equivalence class completeness. Our work provides a reproducible methodology and empirical evidence for leveraging LLMs in software testing practice.

Technology Category

Application Category

📝 Abstract
The design and implementation of unit tests is a complex task many programmers neglect. This research evaluates the potential of Large Language Models (LLMs) in automatically generating test cases, comparing them with manual tests. An optimized prompt was developed, that integrates code and requirements, covering critical cases such as equivalence partitions and boundary values. The strengths and weaknesses of LLMs versus trained programmers were compared through quantitative metrics and manual qualitative analysis. The results show that the effectiveness of LLMs depends on well-designed prompts, robust implementation, and precise requirements. Although flexible and promising, LLMs still require human supervision. This work highlights the importance of manual qualitative analysis as an essential complement to automation in unit test evaluation.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs for automated unit test generation
Comparing LLM-generated tests with manual test cases
Assessing LLM effectiveness with optimized prompts
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs generate test cases automatically
Optimized prompt integrates code and requirements
Human supervision complements LLM automation