Assign and Add: A Mechanistic Study of Compositional Arithmetic

📅 2026-05-29
📈 Citations: 0
Influential: 0
📄 PDF

career value

184K/year
🤖 AI Summary
This study investigates how large language models generalize to complex tasks unseen during training by composing pre-existing skills, focusing on the compositional setting of variable assignment and modular addition. Through a controlled experimental setup—incorporating curated data splits, mechanistic interpretability analyses, and training dynamics tracking—the work examines the compositional generalization capabilities of small Transformer models. It reveals for the first time that the same modular addition MLP module can be reused for both direct and indirect inputs, and proposes a three-stage theory of training dynamics that elucidates how compositional abilities naturally emerge. The results demonstrate successful generalization to novel combinations of variables and numerical values, providing evidence that internal modular compositionality is key to achieving systematic generalization.
📝 Abstract
Large language models are able to compose skills in order to perform complex tasks, many of which might not have been seen during training. The details of how exactly this composition occurs remain elusive. In this paper, we study a mechanism for compositional generalization in transformers by considering a simple controlled setting involving variable assignment and modular addition. By partitioning our training data into disjoint sets, we observe that small transformers are able to generalize to previously unseen combinations of variables and numbers. Our mechanistic analysis shows that the same ``modular addition'' MLP module is used whether the inputs are given directly or indirectly through a separate variable assignment mechanism. We also analyze the training dynamics from an empirical lens, which reveals three phases of learning: first, modular addition is learned, then the structure required for variable assignment, and finally a refinement phase where the model generalizes to some hard sequences not seen in training. Finally, we provide a theoretical framework to explain how compositionality emerges from training dynamics. These results suggest that compositional generalization can be a natural consequence of the compositionality of internal mechanisms in~transformers.
Problem

Research questions and friction points this paper is trying to address.

compositional generalization
transformers
mechanistic analysis
modular addition
variable assignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

compositional generalization
transformer mechanisms
modular addition
variable assignment
training dynamics
🔎 Similar Papers
No similar papers found.