🤖 AI Summary
This paper identifies a fundamental degradation in Transformer-based in-context learning (ICL) efficiency under long contexts—a critical gap in understanding whether ICL is theoretically optimal.
Method: To address the lack of a formal optimality theory for ICL, the authors establish the first information-theoretic framework for quantifying ICL optimality. Within a simplified, interpretable learning setting, they rigorously analyze ICL’s generalization behavior without fine-tuning or data augmentation.
Results: They prove that while ICL initially approximates the Bayesian optimal predictor, its excess generalization error deteriorates as Ω(√L) with context length L—demonstrating an intrinsic information bottleneck, not an artifact of training or architecture. This work provides the first quantitative characterization and tight theoretical bound on ICL’s efficiency decay, establishing a new paradigm and foundational theory for designing context-length-invariant, online adaptive learning algorithms.
📝 Abstract
Transformers have demonstrated remarkable in-context learning (ICL) capabilities, adapting to new tasks by simply conditioning on demonstrations without parameter updates. Compelling empirical and theoretical evidence suggests that ICL, as a general-purpose learner, could outperform task-specific models. However, it remains unclear to what extent the transformers optimally learn in-context compared to principled learning algorithms. To bridge this gap, we introduce a new framework for quantifying optimality of ICL as a learning algorithm in stylized settings. Our findings reveal a striking dichotomy: while ICL initially matches the efficiency of a Bayes optimal estimator, its efficiency significantly deteriorates in long context. Through an information-theoretic analysis, we show that the diminishing efficiency is inherent to ICL. These results clarify the trade-offs in adopting ICL as a universal problem solver, motivating a new generation of on-the-fly adaptive methods without the diminishing efficiency.