🤖 AI Summary
Existing evolutionary prompt search methods suffer from insufficient operator robustness and inefficient evaluation. This paper proposes a phased evolutionary prompt optimization framework: first, decoupling mutation, selection, and elimination steps to enhance search stability; second, integrating large language models as differentiable judges (LLM-as-Judge) for fine-grained, low-cost prompt quality assessment; and third, dynamically adapting evolutionary operators using human feedback to improve directional control over the search process. Experiments demonstrate that our method significantly improves prompt optimization quality across multiple benchmark tasks—achieving an average +12.3% accuracy gain—while reducing evaluation overhead by 57%. Moreover, the optimized prompts exhibit strong cross-task transferability. The implementation is publicly available.
📝 Abstract
Evolutionary prompt optimization has demonstrated effectiveness in refining prompts for LLMs. However, existing approaches lack robust operators and efficient evaluation mechanisms. In this work, we propose several key improvements to evolutionary prompt optimization that can partially generalize to prompt optimization in general: 1) decomposing evolution into distinct steps to enhance the evolution and its control, 2) introducing an LLM-based judge to verify the evolutions, 3) integrating human feedback to refine the evolutionary operator, and 4) developing more efficient evaluation strategies that maintain performance while reducing computational overhead. Our approach improves both optimization quality and efficiency. We release our code, enabling prompt optimization on new tasks and facilitating further research in this area.