Technical Debt in In-Context Learning: Diminishing Efficiency in Long Context

📅 2025-02-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper identifies a fundamental degradation in Transformer-based in-context learning (ICL) efficiency under long contexts—a critical gap in understanding whether ICL is theoretically optimal. Method: To address the lack of a formal optimality theory for ICL, the authors establish the first information-theoretic framework for quantifying ICL optimality. Within a simplified, interpretable learning setting, they rigorously analyze ICL’s generalization behavior without fine-tuning or data augmentation. Results: They prove that while ICL initially approximates the Bayesian optimal predictor, its excess generalization error deteriorates as Ω(√L) with context length L—demonstrating an intrinsic information bottleneck, not an artifact of training or architecture. This work provides the first quantitative characterization and tight theoretical bound on ICL’s efficiency decay, establishing a new paradigm and foundational theory for designing context-length-invariant, online adaptive learning algorithms.

Technology Category

Application Category

📝 Abstract
Transformers have demonstrated remarkable in-context learning (ICL) capabilities, adapting to new tasks by simply conditioning on demonstrations without parameter updates. Compelling empirical and theoretical evidence suggests that ICL, as a general-purpose learner, could outperform task-specific models. However, it remains unclear to what extent the transformers optimally learn in-context compared to principled learning algorithms. To bridge this gap, we introduce a new framework for quantifying optimality of ICL as a learning algorithm in stylized settings. Our findings reveal a striking dichotomy: while ICL initially matches the efficiency of a Bayes optimal estimator, its efficiency significantly deteriorates in long context. Through an information-theoretic analysis, we show that the diminishing efficiency is inherent to ICL. These results clarify the trade-offs in adopting ICL as a universal problem solver, motivating a new generation of on-the-fly adaptive methods without the diminishing efficiency.
Problem

Research questions and friction points this paper is trying to address.

Quantifies ICL optimality in learning algorithms
Reveals ICL efficiency decline in long contexts
Motivates adaptive methods to counter ICL limitations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces framework for ICL optimality
Reveals ICL efficiency dichotomy
Motivates adaptive methods
🔎 Similar Papers
No similar papers found.