Technical Debt in In-Context Learning: Diminishing Efficiency in Long Context

📅 2025-02-07

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

This paper identifies a fundamental degradation in Transformer-based in-context learning (ICL) efficiency under long contexts—a critical gap in understanding whether ICL is theoretically optimal. Method: To address the lack of a formal optimality theory for ICL, the authors establish the first information-theoretic framework for quantifying ICL optimality. Within a simplified, interpretable learning setting, they rigorously analyze ICL’s generalization behavior without fine-tuning or data augmentation. Results: They prove that while ICL initially approximates the Bayesian optimal predictor, its excess generalization error deteriorates as Ω(√L) with context length L—demonstrating an intrinsic information bottleneck, not an artifact of training or architecture. This work provides the first quantitative characterization and tight theoretical bound on ICL’s efficiency decay, establishing a new paradigm and foundational theory for designing context-length-invariant, online adaptive learning algorithms.

Technology Category

Application Category

📝 Abstract

Transformers have demonstrated remarkable in-context learning (ICL) capabilities, adapting to new tasks by simply conditioning on demonstrations without parameter updates. Compelling empirical and theoretical evidence suggests that ICL, as a general-purpose learner, could outperform task-specific models. However, it remains unclear to what extent the transformers optimally learn in-context compared to principled learning algorithms. To bridge this gap, we introduce a new framework for quantifying optimality of ICL as a learning algorithm in stylized settings. Our findings reveal a striking dichotomy: while ICL initially matches the efficiency of a Bayes optimal estimator, its efficiency significantly deteriorates in long context. Through an information-theoretic analysis, we show that the diminishing efficiency is inherent to ICL. These results clarify the trade-offs in adopting ICL as a universal problem solver, motivating a new generation of on-the-fly adaptive methods without the diminishing efficiency.

Problem

Research questions and friction points this paper is trying to address.

Quantifies ICL optimality in learning algorithms

Reveals ICL efficiency decline in long contexts

Motivates adaptive methods to counter ICL limitations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces framework for ICL optimality

Reveals ICL efficiency dichotomy

Motivates adaptive methods

🔎 Similar Papers

In-Context Learning with Long-Context Models: An In-Depth Exploration