More is not always better? Enhancing Many-Shot In-Context Learning with Differentiated and Reweighting Objectives

📅 2025-01-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the performance degradation of large language models (LLMs) in many-shot in-context learning (ICL) as the number of demonstrations increases. We first identify suboptimality of the negative log-likelihood (NLL) objective and incremental noise accumulation as primary causes. To mitigate these issues, we propose DR-ICL—a novel method integrating global differential learning with a local advantage-driven dynamic reweighting mechanism. We further introduce MICLB, the first large-scale, multi-task many-shot ICL benchmark covering 1–350 shots and up to 8K context tokens, enabling systematic long-context synthesis and evaluation. Extensive experiments across seven NLP task categories and fifty datasets demonstrate consistent, significant improvements over state-of-the-art baselines—enhancing both in-domain and out-of-domain generalization. DR-ICL systematically surpasses zero-shot performance ceilings, achieving stable gains even at 350-shot settings.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) excel at few-shot in-context learning (ICL) without requiring parameter updates. However, as the number of ICL demonstrations increases from a few to many, performance tends to plateau and eventually decline. We identify two primary causes for this trend: the suboptimal negative log-likelihood (NLL) optimization objective and the incremental data noise. To address these issues, we introduce DR-ICL, a novel optimization method that enhances model performance through Differentiated Learning and advantage-based Reweighting objectives. Globally, DR-ICL utilizes differentiated learning to optimize the NLL objective, ensuring that many-shot performance surpasses zero-shot levels. Locally, it dynamically adjusts the weighting of many-shot demonstrations by leveraging cumulative advantages inspired by reinforcement learning, thereby improving generalization. This approach allows the model to handle varying numbers of shots effectively, mitigating the impact of noisy data. Recognizing the lack of multi-task datasets with diverse many-shot distributions, we develop the Many-Shot ICL Benchmark (MICLB)-a large-scale benchmark covering shot numbers from 1 to 350 within sequences of up to 8,000 tokens-for fine-tuning purposes. MICLB facilitates the evaluation of many-shot ICL strategies across seven prominent NLP tasks and 50 distinct datasets. Experimental results demonstrate that LLMs enhanced with DR-ICL achieve significant improvements in many-shot setups across various tasks, including both in-domain and out-of-domain scenarios. We release the code and benchmark dataset hoping to facilitate further research in many-shot ICL.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Optimization Objectives
Data Efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

DR-ICL
Many-Shot ICL Benchmark
Adaptive Learning
🔎 Similar Papers
No similar papers found.
X
Xiaoqing Zhang
Gaoling School of Artificial Intelligence, Renmin University of China; MoonshotAI
Ang Lv
Ang Lv
Renmin University of China
Language Model
Y
Yuhan Liu
Gaoling School of Artificial Intelligence, Renmin University of China
Flood Sung
Flood Sung
Moonshot AI
Foundation ModelsLLM/VLMAgentReinforcement LearningMeta Learning
W
Wei Liu
Xiaomi AI Lab
Shuo Shang
Shuo Shang
Computer Science & AI Scientist
Spatial dataSpatiotemporal databases
Xiuying Chen
Xiuying Chen
MBZUAI
Trustworthy NLPHuman-Centered NLPComputational Social Science
R
Rui Yan
Gaoling School of Artificial Intelligence, Renmin University of China