🤖 AI Summary
Prompt optimization for large language models faces an inherent trade-off between performance (e.g., accuracy) and context length (in tokens); existing approaches typically pursue single-objective optimization, failing to jointly balance efficiency and effectiveness. This paper introduces the first multi-objective semantic evolutionary framework for prompt engineering, integrating Pareto optimization into prompt design. Leveraging semantic-aware mutation and crossover operators, it concurrently searches for a Pareto-optimal set of prompts that jointly optimize accuracy and prompt length. Evaluated on Portuguese sentiment analysis using the Sabiazinho-3 model, our method achieves a peak accuracy of 0.97—matching baseline performance—while reducing token count by 31%. Crucially, it yields interpretable, production-ready trade-off solutions, enabling practitioners to select prompts based on task-specific constraints.
📝 Abstract
Prompt engineering is crucial for unlocking the potential of Large Language Models (LLMs). Still, since manual prompt design is often complex, non-intuitive, and time-consuming, automatic prompt optimization has emerged as a research area. However, a significant challenge in prompt optimization is managing the inherent trade-off between task performance, such as accuracy, and context size. Most existing automated methods focus on a single objective, typically performance, thereby failing to explore the critical spectrum of efficiency and effectiveness. This paper introduces the MOPrompt, a novel Multi-objective Evolutionary Optimization (EMO) framework designed to optimize prompts for both accuracy and context size (measured in tokens) simultaneously. Our framework maps the Pareto front of prompt solutions, presenting practitioners with a set of trade-offs between context size and performance, a crucial tool for deploying Large Language Models (LLMs) in real-world applications. We evaluate MOPrompt on a sentiment analysis task in Portuguese, using Gemma-2B and Sabiazinho-3 as evaluation models. Our findings show that MOPrompt substantially outperforms the baseline framework. For the Sabiazinho model, MOPrompt identifies a prompt that achieves the same peak accuracy (0.97) as the best baseline solution, but with a 31% reduction in token length.