π€ AI Summary
Large language models (LLMs) exhibit high sensitivity to prompts, yet existing automatic prompt optimization (APO) methods typically rely on high-quality labeled dataβoften unavailable in practice. This work proposes Prompt Duel Optimizer (PDO), the first APO framework that formulates prompt optimization as a dueling bandit problem, enabling label-free operation. Its core contributions are threefold: (1) a Dual Thompson Sampling (D-TS) strategy for principled exploration-exploitation trade-offs; (2) a Top-Performer Guided Mutation mechanism that leverages the LLM itself as an unsupervised preference oracle for prompt refinement; and (3) inherent compatibility with partial supervision to mitigate label noise. PDO achieves significant improvements over strong baselines on BBH and MS MARCO. Ablation studies confirm the critical roles of D-TS and the mutation strategy in driving performance gains.
π Abstract
Large language models (LLMs) are highly sensitive to their input prompts, making prompt design a central challenge. While automatic prompt optimization (APO) reduces manual engineering, most approaches assume access to ground-truth references such as labeled validation data. In practice, however, collecting high-quality labels is costly and slow. We propose the Prompt Duel Optimizer (PDO), a sample-efficient framework for label-free prompt optimization. PDO formulates the problem as a dueling-bandit setting, where supervision signal comes from pairwise preference feedback provided by an LLM judge. The framework combines Double Thompson Sampling (D-TS), which prioritizes informative prompt comparisons, with Top-Performer Guided Mutation, which expands the candidate pool by mutating high-performing prompts. PDO naturally operates in label-free settings and can also incorporate partial labels to mitigate judge noise. Experiments on BIG-bench Hard (BBH) and MS MARCO show that PDO consistently outperforms baseline methods. Ablation studies further demonstrate the effectiveness of both D-TS and prompt mutation.