APEX: Automated Prompt Engineering eXpert with Dynamic Data Selection

📅 2026-06-09

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Large language models are highly sensitive to prompt formulation, yet existing automated prompt optimization methods suffer from low data efficiency due to their static use of training data. This work proposes APEX, a novel framework that co-designs dynamic data selection with prompt optimization. APEX partitions data into easy, hard, and mixed categories, prioritizing samples from the mixed set where model predictions exhibit inconsistency. It further identifies two high-leverage subsets—the “optimizable frontier” and the “ranking-sensitive frontier”—to guide an evolutionary algorithm in efficiently evolving prompts. Under a strict budget of 5,000 model queries, APEX achieves average performance gains of 11.2% on Gemini 2.5 Flash and 6.8% on Gemma 3 27B, significantly outperforming the initial prompt.

📝 Abstract

Large Language Models are highly sensitive to prompt formulation, necessitating automatic prompt optimization to unlock their full potential. While evolutionary algorithms have emerged as the dominant paradigm, they suffer from a critical bottleneck: data efficiency. Current methods treat the development dataset as a static benchmark, wasting significant compute budget on uninformative data. In this work, we introduce APEX (Automatic Prompt Engineering eXpert), a novel framework that optimizes the data usage alongside the prompt search. APEX dynamically stratifies the dataset into Easy, Hard, and Mixed tiers based on the optimization lineage. By prioritizing the Mixed tier, which identifies the data where the LLM has mixed performance, we identify two high-leverage subsets: the addressable frontier for generating informative mutations and the rank-sensitive frontier for distinguishing candidate quality. We evaluate APEX across three diverse benchmarks: IFBench, SimpleQA Verified, and FACTS Grounding. Under a fixed budget of 5,000 evaluation calls, due to its data efficiency, APEX outperforms the initial prompt by an average of 11.2% on Gemini 2.5 Flash and 6.8% on Gemma 3 27B, demonstrating that a data-centric approach is key to efficient and effective prompt optimization.

Problem

Research questions and friction points this paper is trying to address.

prompt optimization

data efficiency

large language models

evolutionary algorithms

dynamic data selection

Innovation

Methods, ideas, or system contributions that make the work stand out.

dynamic data selection

prompt optimization

data efficiency