In-Context Learning with Long-Context Models: An In-Depth Exploration

📅 2024-04-30
🏛️ arXiv.org
📈 Citations: 54
Influential: 6
📄 PDF
🤖 AI Summary
This work investigates the behavioral limits of in-context learning (ICL) with thousands of demonstrations in ultra-long-context language models. Through systematic experiments across multiple models (e.g., Llama, Qwen) and datasets, and employing controlled analytical techniques—including random shuffling, label-based grouping, and demonstration subsampling—we find that: (1) ICL robustness to input ordering significantly increases with context length; (2) clustering examples by label degrades performance; and (3) gains do not arise from joint encoding of multiple demonstrations. Key contributions include: the first empirical demonstration that ICL performance scales continuously with demonstration count up to several thousand in large-label-space tasks; superior effectiveness over fine-tuning under low-to-moderate data regimes; and non-negligible gains achievable without fully utilizing available context capacity—challenging prevailing assumptions about ICL mechanisms.

Technology Category

Application Category

📝 Abstract
As model context lengths continue to increase, the number of demonstrations that can be provided in-context approaches the size of entire training datasets. We study the behavior of in-context learning (ICL) at this extreme scale on multiple datasets and models. We show that, for many datasets with large label spaces, performance continues to increase with thousands of demonstrations. We contrast this with example retrieval and finetuning: example retrieval shows excellent performance at low context lengths but has diminished gains with more demonstrations; finetuning is more data hungry than ICL but can exceed long-context ICL performance with additional data. We use the ICL setting to study several properties of both in-context learning and long-context models. We show that long-context ICL is less sensitive to random input shuffling than short-context ICL, that grouping of same-label examples negatively impacts performance, and that the performance boosts do not arise from cumulative gain from encoding many examples together. We conclude that long-context ICL can be an effective tool, and may not require long-context for encoding the demonstration set at all.
Problem

Research questions and friction points this paper is trying to address.

Explores in-context learning with long-context models.
Compares in-context learning with example retrieval and finetuning.
Investigates properties of in-context learning and long-context models.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Long-context ICL increases performance with thousands of demonstrations.
ICL less sensitive to input shuffling than short-context ICL.
Grouping same-label examples negatively impacts ICL performance.
🔎 Similar Papers
No similar papers found.