Distinct Computations Emerge From Compositional Curricula in In-Context Learning

📅 2025-06-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates whether introducing structured subtask curricula in in-context learning can induce compositional computation capabilities in Transformers and improve zero-shot generalization and robustness on unseen compositional tasks. Method: Using modular exponentiation (double-exponentiation) and its single-exponentiation subtasks as a benchmark, we compare progressive subtask curricula against end-to-end direct training under identical context length constraints. Contribution/Results: We report the first empirical evidence that curriculum-based context presentation enables zero-shot reasoning on novel compositional tasks, reducing error rates by 42% relative to direct training. Representation analysis reveals the emergence of hierarchical, decomposable internal computation structures. Moreover, varying curriculum designs elicit diverse reasoning strategies, significantly enhancing contextual robustness. These findings demonstrate that curriculum design exerts a measurable, plastic influence on the intrinsic computational mechanisms of large language models.

Technology Category

Application Category

📝 Abstract
In-context learning (ICL) research often considers learning a function in-context through a uniform sample of input-output pairs. Here, we investigate how presenting a compositional subtask curriculum in context may alter the computations a transformer learns. We design a compositional algorithmic task based on the modular exponential-a double exponential task composed of two single exponential subtasks and train transformer models to learn the task in-context. We compare (a) models trained using an in-context curriculum consisting of single exponential subtasks and, (b) models trained directly on the double exponential task without such a curriculum. We show that models trained with a subtask curriculum can perform zero-shot inference on unseen compositional tasks and are more robust given the same context length. We study how the task and subtasks are represented across the two training regimes. We find that the models employ diverse strategies modulated by the specific curriculum design.
Problem

Research questions and friction points this paper is trying to address.

Investigates how compositional subtask curricula affect transformer computations
Compares curriculum-trained models vs direct training on complex tasks
Analyzes representation strategies across different curriculum designs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Compositional subtask curriculum in-context learning
Transformer models trained on modular exponential task
Zero-shot inference on unseen compositional tasks
Jin Hwa Lee
Jin Hwa Lee
University College London, Sainsbury Wellcome Centre
Computational Neuroscience
A
Andrew K. Lampinen
Google Deepmind, Mountain View, CA, USA
A
Aaditya K. Singh
University College London, London, UK
A
Andrew M. Saxe
University College London, London, UK, CIFAR Azrieli Global Scholar, CIFAR