The Prompt Alchemist: Automated LLM-Tailored Prompt Optimization for Test Case Generation

📅 2025-01-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the low quality and repetitive errors in test case generation by large language models (LLMs) caused by generic human-crafted prompts, this paper proposes an automated, LLM-specific prompt optimization method. Our approach introduces three key contributions: (1) a novel domain-knowledge-guided prompt evolution framework enabling personalized prompt search for individual LLMs; (2) a code-semantic-aware feedback evaluator and a multi-LLM-adaptive prompt encoder; and (3) a Bayesian optimization–based prompt evolution algorithm enhanced with domain knowledge and a diversity-preserving mechanism. Extensive experiments across eight mainstream LLMs demonstrate that our method achieves average improvements of 32.7% in test case accuracy and 28.4% in fault coverage over state-of-the-art baselines, confirming its effectiveness and generalizability.

Technology Category

Application Category

📝 Abstract
Test cases are essential for validating the reliability and quality of software applications. Recent studies have demonstrated the capability of Large Language Models (LLMs) to generate useful test cases for given source code. However, the existing work primarily relies on human-written plain prompts, which often leads to suboptimal results since the performance of LLMs can be highly influenced by the prompts. Moreover, these approaches use the same prompt for all LLMs, overlooking the fact that different LLMs might be best suited to different prompts. Given the wide variety of possible prompt formulations, automatically discovering the optimal prompt for each LLM presents a significant challenge. Although there are methods on automated prompt optimization in the natural language processing field, they are hard to produce effective prompts for the test case generation task. First, the methods iteratively optimize prompts by simply combining and mutating existing ones without proper guidance, resulting in prompts that lack diversity and tend to repeat the same errors in the generated test cases. Second, the prompts are generally lack of domain contextual knowledge, limiting LLMs' performance in the task.
Problem

Research questions and friction points this paper is trying to address.

Software Testing
Large Language Models
Test Case Generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Auto-Optimization
Prompt Tuning
Test Case Generation
Shuzheng Gao
Shuzheng Gao
The Chinese University of Hong Kong
Code IntelligenceSoftware EngineeringLarge Language Models
Chaozheng Wang
Chaozheng Wang
The Chinese University of Hong Kong
software engineeringartificial intelligence
C
Cuiyun Gao
Harbin Institute of Technology, Shenzhen, China
X
Xiaoqian Jiao
Harbin Institute of Technology, Shenzhen, China
Chun Yong Chong
Chun Yong Chong
Monash University
Software Engineering
S
Shan Gao
Independent Researcher, China
M
Michael Lyu
The Chinese University of Hong Kong, Hong Kong, China