CriSPO: Multi-Aspect Critique-Suggestion-guided Automatic Prompt Optimization for Text Generation

📅 2024-10-03
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing automatic prompt engineering methods rely on single-metric optimization, failing to simultaneously satisfy multidimensional requirements such as text quality, diversity, and faithfulness. This paper proposes a multidimensional critique-and-suggestion-driven automatic prompt optimization framework. First, it autonomously discovers task-specific evaluation dimensions. Second, a fine-grained critique-and-suggestion module compares generated outputs against reference texts to produce actionable, interpretable revision suggestions. Third, differentiable Automatic Suffix Tuning (AST) enables joint optimization across multiple metrics, breaking the conventional single-metric iterative paradigm. Extensive experiments across four mainstream large language models and nine summarization and question-answering benchmarks demonstrate substantial improvements: ROUGE-L scores increase by 3–4% in summarization, while multiple QA metrics—including BLEU, METEOR, and answer correctness—show significant gains. The framework achieves superior generalizability, interpretability, and optimization efficiency compared to prior approaches.

Technology Category

Application Category

📝 Abstract
Existing automatic prompt engineering methods are typically designed for discriminative tasks, where new task prompts are iteratively refined with limited feedback from a single metric reflecting a single aspect. However, these approaches are suboptimal for generative tasks, which require more nuanced guidance beyond a single numeric metric to improve the prompt and optimize multiple aspects of the generated text. To address these challenges, we propose a novel multi-aspect Critique-Suggestion-guided automatic Prompt Optimization (CriSPO) approach. CriSPO introduces a critique-suggestion module as its core component. This module spontaneously discovers aspects, and compares generated and reference texts across these aspects, providing specific suggestions for prompt modification. These clear critiques and actionable suggestions guide a receptive optimizer module to make more substantial changes, exploring a broader and more effective search space. To further improve CriSPO with multi-metric optimization, we introduce an Automatic Suffix Tuning (AST) extension to enhance the performance of task prompts across multiple metrics. We evaluate CriSPO on 4 state-of-the-art LLMs across 4 summarization and 5 QA datasets. Extensive experiments show 3-4% ROUGE score improvement on summarization and substantial improvement of various metrics on QA. Code available at https://github.com/amazon-science/crispo
Problem

Research questions and friction points this paper is trying to address.

Generative Tasks
Multi-dimensional Guidance
Text Quality and Diversity
Innovation

Methods, ideas, or system contributions that make the work stand out.

CriSPO
Multi-dimensional Guidance
AST Extension
🔎 Similar Papers
No similar papers found.
Han He
Han He
Emory University
Natural Language Processing
Qianchu Liu
Qianchu Liu
Microsoft Research
Natural Language Processing
L
Lei Xu
Amazon AWS AI Labs
C
Chaitanya P. Shivade
Amazon AWS AI Labs
Y
Yi Zhang
Amazon AWS AI Labs
S
S. Srinivasan
Amazon AWS AI Labs
Katrin Kirchhoff
Katrin Kirchhoff
Oracle
AINatural Language ProcessingSpeech Recognition