From Compression to Expansion: A Layerwise Analysis of In-Context Learning

📅 2025-05-22

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This work investigates the inter-layer representation mechanism of task information in large language model (LLM) in-context learning (ICL). Method: We identify and formally name the “inter-layer compression–expansion” dynamic: early layers compress input examples to extract discriminative task features, while later layers expand representations to integrate the query and generate predictions. Building on this, we develop a bias–variance decomposition framework with theoretically grounded attention mechanism co-optimization, supported by statistical geometric analysis, inter-layer representation visualization, attention-theoretic modeling, and bias–variance decomposition. Contribution/Results: We empirically validate the universality of this dynamic across diverse tasks and mainstream LLMs; explain how ICL performance scales with model size and shot count; and significantly improve prediction robustness under noisy demonstrations—thereby revealing the intrinsic mechanism by which ICL enhances few-shot robustness.

Technology Category

Application Category

📝 Abstract

In-context learning (ICL) enables large language models (LLMs) to adapt to new tasks without weight updates by learning from demonstration sequences. While ICL shows strong empirical performance, its internal representational mechanisms are not yet well understood. In this work, we conduct a statistical geometric analysis of ICL representations to investigate how task-specific information is captured across layers. Our analysis reveals an intriguing phenomenon, which we term *Layerwise Compression-Expansion*: early layers progressively produce compact and discriminative representations that encode task information from the input demonstrations, while later layers expand these representations to incorporate the query and generate the prediction. This phenomenon is observed consistently across diverse tasks and a range of contemporary LLM architectures. We demonstrate that it has important implications for ICL performance -- improving with model size and the number of demonstrations -- and for robustness in the presence of noisy examples. To further understand the effect of the compact task representation, we propose a bias-variance decomposition and provide a theoretical analysis showing how attention mechanisms contribute to reducing both variance and bias, thereby enhancing performance as the number of demonstrations increases. Our findings reveal an intriguing layerwise dynamic in ICL, highlight how structured representations emerge within LLMs, and showcase that analyzing internal representations can facilitate a deeper understanding of model behavior.

Problem

Research questions and friction points this paper is trying to address.

Understanding internal representational mechanisms of in-context learning in LLMs

Analyzing layerwise compression-expansion phenomenon in ICL representations

Investigating how attention mechanisms reduce bias and variance in ICL

Innovation

Methods, ideas, or system contributions that make the work stand out.

Layerwise Compression-Expansion in ICL representations

Attention mechanisms reduce bias and variance

Analysis of internal representations enhances model understanding

🔎 Similar Papers

Loss Landscape Degeneracy Drives Stagewise Development in Transformers