Order Matters: Rethinking Prompt Construction in In-Context Learning

📅 2025-11-12

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

Prior work assumes example selection dominates example ordering in in-context learning (ICL), treating the latter as negligible. This study challenges that assumption by systematically investigating how example order affects large language model (LLM) performance. Method: Through controlled experiments across classification and generation tasks, we evaluate open-source models (0.5B–27B parameters) and GPT-5, isolating the impact of permutation while holding example sets constant. Contribution/Results: We demonstrate that reordering examples induces performance fluctuations comparable in magnitude to replacing the entire example set—establishing ordering as equally critical as selection. Moreover, we provide the first empirical evidence that near-optimal permutations can be efficiently discovered using only development-set labels, achieving performance close to globally optimal (test-label-dependent) ordering. This work introduces a new ICL paradigm—jointly optimizing example selection and ordering—and proposes a lightweight, practical method for order optimization, advancing prompt engineering with theoretically grounded, empirically validated insights.

Technology Category

Application Category

📝 Abstract

In-context learning (ICL) enables large language models to perform new tasks by conditioning on a sequence of examples. Most prior work reasonably and intuitively assumes that which examples are chosen has a far greater effect on performance than how those examples are ordered, leading to a focus on example selection. We revisit this assumption and conduct a systematic comparison between the effect of selection and ordering. Through controlled experiments on both classification and generation tasks, using multiple open-source model families (0.5B to 27B parameters) and GPT-5, we find that the variance in performance due to different example orderings is comparable to that from using entirely different example sets. Furthermore, we show that strong orderings can be identified using only a development set, achieving performance close to an oracle that selects the best ordering based on test labels. Our findings highlight the equal and intertwined importance of example selection and ordering in prompt design, calling for a reexamination of the assumptions held in ICL.

Problem

Research questions and friction points this paper is trying to address.

Investigates how example ordering impacts in-context learning performance

Compares variance from example selection versus example ordering strategies

Demonstrates ordering effects comparable to using different example sets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Example ordering variance comparable to selection impact

Development set identifies effective ordering strategies

Selection and ordering equally important in prompt design

🔎 Similar Papers

No similar papers found.