AutoPDL: Automatic Prompt Optimization for LLM Agents

📅 2025-04-06

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

To address the labor-intensive, error-prone, and low-transferable nature of manual prompt engineering across LLMs and tasks, this paper proposes the first framework that formalizes prompt optimization as a structured AutoML problem. Methodologically: (1) it jointly searches over high-level prompting paradigms (e.g., Chain-of-Thought, ReAct, ReWOO) and concrete prompt content; (2) it implements source-to-source optimization via PDL—a human-readable, editable, and reusable prompt description language; and (3) it integrates successive halving with a standardized prompt pattern library to enable human-in-the-loop iterative refinement. Evaluated on three diverse tasks across six LLMs (8B–70B parameters), our approach achieves an average accuracy gain of 9.5±17.5 percentage points, with a maximum improvement of 68.9 pp—demonstrating the strong model- and task-specificity of optimal prompting strategies.

Technology Category

Application Category

📝 Abstract

The performance of large language models (LLMs) depends on how they are prompted, with choices spanning both the high-level prompting pattern (e.g., Zero-Shot, CoT, ReAct, ReWOO) and the specific prompt content (instructions and few-shot demonstrations). Manually tuning this combination is tedious, error-prone, and non-transferable across LLMs or tasks. Therefore, this paper proposes AutoPDL, an automated approach to discover good LLM agent configurations. Our method frames this as a structured AutoML problem over a combinatorial space of agentic and non-agentic prompting patterns and demonstrations, using successive halving to efficiently navigate this space. We introduce a library implementing common prompting patterns using the PDL prompt programming language. AutoPDL solutions are human-readable, editable, and executable PDL programs that use this library. This approach also enables source-to-source optimization, allowing human-in-the-loop refinement and reuse. Evaluations across three tasks and six LLMs (ranging from 8B to 70B parameters) show consistent accuracy gains ($9.5pm17.5$ percentage points), up to 68.9pp, and reveal that selected prompting strategies vary across models and tasks.

Problem

Research questions and friction points this paper is trying to address.

Automates prompt optimization for LLM agents

Addresses tedious manual tuning of prompting strategies

Enhances accuracy across diverse models and tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated LLM prompt optimization via AutoPDL

Structured AutoML over combinatorial prompt space

Human-readable PDL programs for prompt execution

🔎 Similar Papers

RePrompt: Planning by Automatic Prompt Engineering for Large Language Models Agents