Provable In-Context Vector Arithmetic via Retrieving Task Concepts

📅 2025-08-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The phenomenon whereby large language models (LLMs) perform Word2Vec-style vector arithmetic over residual streams during in-context learning (ICL) to retrieve factual knowledge lacks a rigorous theoretical foundation. Method: We propose the first provable hierarchical conceptual modeling framework, grounded in nonlinear residual Transformer architectures. It abstracts ICL tasks as vector operations within a learned concept space and formally characterizes the co-optimization of concept extraction and residual stream propagation. Contributions/Results: We provide the first rigorous proof of 0–1 loss convergence and strong generalization for vector arithmetic in ICL, revealing its robustness to concept recombination and distributional shift. Furthermore, we theoretically establish the superiority of Transformers over static embedding models in this setting. Experimental simulations corroborate all theoretical predictions.

Technology Category

Application Category

📝 Abstract
In-context learning (ICL) has garnered significant attention for its ability to grasp functions/tasks from demonstrations. Recent studies suggest the presence of a latent task/function vector in LLMs during ICL. Merullo et al. (2024) showed that LLMs leverage this vector alongside the residual stream for Word2Vec-like vector arithmetic, solving factual-recall ICL tasks. Additionally, recent work empirically highlighted the key role of Question-Answer data in enhancing factual-recall capabilities. Despite these insights, a theoretical explanation remains elusive. To move one step forward, we propose a theoretical framework building on empirically grounded hierarchical concept modeling. We develop an optimization theory, showing how nonlinear residual transformers trained via gradient descent on cross-entropy loss perform factual-recall ICL tasks via vector arithmetic. We prove 0-1 loss convergence and show the strong generalization, including robustness to concept recombination and distribution shifts. These results elucidate the advantages of transformers over static embedding predecessors. Empirical simulations corroborate our theoretical insights.
Problem

Research questions and friction points this paper is trying to address.

Understanding latent task vectors in LLMs during ICL
Explaining factual-recall ICL via vector arithmetic theoretically
Demonstrating transformers' robustness to concept shifts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical concept modeling for theoretical framework
Nonlinear transformers optimize via gradient descent
Proven robustness to concept recombination shifts
🔎 Similar Papers
No similar papers found.