🤖 AI Summary
This study investigates how Transformers leverage attention mechanisms to generalize on structured multi-hop reasoning tasks, with a particular focus on their divergent performance in long-sequence extrapolation. By training GPT-J models on equivalent numeric and alphabetic tasks and employing controlled experiments, attention head behavior classification, and geometric analysis of Rotary Position Embeddings (RoPE), the work provides the first clear distinction and theoretical characterization of positional versus symbolic attention heads in terms of their computational roles. The authors introduce a “discrepancy” metric that quantitatively demonstrates the superior extrapolation robustness of symbolic mechanisms over positional ones. Furthermore, they establish that the presence of purely typed attention heads is critical for successful learning—a finding consistently validated across both controlled setups and real-world models.
📝 Abstract
Transformer-based language models are widespread in today's society. As such, understanding the mechanisms by which they solve structured tasks and predicting how they may behave in novel scenarios is of great importance for safe deployment. We study the learning dynamics of attention heads in a controlled setting by training a decoder-only Transformer (GPT-J) on two structurally equivalent multi-hop reasoning tasks: a number task requiring positional reasoning and a letter task requiring symbolic reasoning. Using a recently introduced metric that classifies attention-head behavior as positional or symbolic for a given prompt, we show that successful learning is associated with the emergence of pure heads, i.e., heads that express themselves as either positional or symbolic. Despite the tasks' structural equivalence, they impose different mechanistic demands: the number task requires both positional and symbolic heads, whereas the letter task requires only symbolic heads. We then identify the computational roles of these heads, characterize the basic functions they implement, and give theoretical constructions showing how single-layer RoPE-based attention can realize these functions through geometrically interpretable query, key, and value operations. This analysis yields a quantitative separation between positional and symbolic mechanisms in their robustness to longer sequences, formalized through a novel notion of discrepancy. We empirically validate the resulting predictions in both controlled and real-world models, showing that symbolic mechanisms extrapolate more reliably to longer sequences while positional mechanisms face sharper limitations.