🤖 AI Summary
To address the lack of interpretability and controllability in large language model (LLM) prompt optimization, this paper proposes the Gradient-inspired Prompt Optimizer (GPO). GPO establishes, for the first time, a systematic analogy between prompt optimization and gradient descent, designing an interpretable and controllable iterative mechanism along two dimensions: (i) *update direction*, determined via retrieval over prompt trajectories, and (ii) *update method*, combining generative refinement with cosine-decayed constraint on edit distance. Crucially, GPO operates entirely at the prompt level—requiring no model parameter fine-tuning—via meta-optimization over prompts. Evaluated on Big-Bench Hard and MMLU, GPO achieves absolute improvements of 56.8% and 62.6% over strong baselines, respectively, substantially enhancing zero-shot and few-shot reasoning capabilities. This work introduces a novel paradigm for prompt engineering grounded in principled, gradient-motivated optimization.
📝 Abstract
Automatic prompt optimization is an important approach to improving the performance of large language models (LLMs). Recent research demonstrates the potential of using LLMs as prompt optimizers, which can generate improved task prompts via iterative refinement. In this paper, we propose a novel perspective to investigate the design of LLM-based prompt optimizers, by drawing an analogy with gradient-based model optimizers. To connect these two approaches, we identify two pivotal factors in model parameter learning: update direction and update method. By systematically analyzing a rich set of improvement strategies on the two aspects, we further develop a capable Gradient-inspired LLM-based Prompt Optimizer called GPO. At each step, it first retrieves relevant prompts from the optimization trajectory as the update direction. Then, it utilizes the generation-based refinement strategy to perform the update, while controlling the edit distance through a cosine-based decay strategy. Extensive experiments demonstrate the effectiveness and efficiency of GPO. In particular, GPO brings an additional improvement of up to 56.8% on Big-Bench Hard and 62.6% on MMLU compared to baseline methods. The code is available at https://github.com/RUCAIBox/GPO.