SePO: Self-Evolving Prompt Agent for System Prompt Optimization

📅 2026-06-03

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Existing prompt optimization methods rely on manually designed and fixed system prompts for prompt agents, which limits overall performance. This work proposes Self-Evolving Prompt Optimization (SePO), the first approach to treat the system prompt of the prompt agent itself as an optimizable variable. SePO employs a self-referential architecture to co-optimize both the task agent and its own system prompt through open-ended evolutionary search. The method integrates multi-task pretraining with target-task fine-tuning and maintains an archive of candidate prompts, enabling end-to-end optimization without any model modification. Evaluated on five benchmarks, SePO achieves an average accuracy gain of 4.49 percentage points over Manual-CoT, outperforming TextGrad and MetaSPO. Moreover, the optimization capability acquired during pretraining generalizes effectively to unseen tasks.

📝 Abstract

System prompt optimization improves agent behavior without modifying the underlying model, yielding human-readable, model-agnostic instructions. Existing methods build a prompt agent that refines task agents' system prompts, yet leave the prompt agent's own system prompt hand-engineered and fixed. We propose Self-Evolving Prompt Optimization (SePO), which treats the prompt agent's own system prompt as an optimization target alongside task agents' system prompts. SePO adopts a self-referential design. A single prompt agent improves both task agents' system prompts and its own under an open-ended evolutionary search that maintains an archive of candidate prompts as stepping stones. Training proceeds in two stages: pre-training evolves the prompt agent on a multi-task pool, and fine-tuning then applies it to a target task. Across five benchmarks spanning math (AIME'25), abstract reasoning (ARC-AGI-1), graduate-level science (GPQA), code generation (MBPP), and logic puzzles (Sudoku), SePO consistently outperforms Manual-CoT, TextGrad, and MetaSPO, improving the average accuracy by 4.49 points compared to Manual-CoT. The prompt optimization skill from pre-training also generalizes to tasks beyond the pre-training mixture, rather than memorizing per-task prompts.

Problem

Research questions and friction points this paper is trying to address.

system prompt optimization

prompt agent

self-evolving

evolutionary search

model-agnostic instructions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-Evolving Prompt Optimization

System Prompt Optimization

Prompt Agent