Deep sequence models tend to memorize geometrically; it is unclear why

📅 2025-10-30

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Existing theoretical frameworks fail to explain how deep sequence models—particularly Transformers—encode global entity relationships geometrically, transcending local co-occurrence patterns (“geometrized memory”). Method: We introduce a parametric geometric perspective on memory, drawing analogies to Node2Vec and leveraging spectral bias analysis to uncover the intrinsic mechanism by which such geometric representations emerge naturally—even without explicit compression constraints. Results: We find that Transformers spontaneously embed atomic facts as points in a low-dimensional geometric space, reducing relational reasoning to concise, one-step geometric operations (e.g., distance comparison, directional alignment). This geometric representation substantially outperforms brute-force lookup tables and exhibits strong generalization. Crucially, our work provides the first systematic demonstration that Transformers internally instantiate an interpretable, analyzable, geometrized fact memory mechanism—challenging conventional associative memory assumptions and establishing a novel paradigm for understanding large-model reasoning.

Technology Category

Application Category

📝 Abstract

In sequence modeling, the parametric memory of atomic facts has been predominantly abstracted as a brute-force lookup of co-occurrences between entities. We contrast this associative view against a geometric view of how memory is stored. We begin by isolating a clean and analyzable instance of Transformer reasoning that is incompatible with memory as strictly a storage of the local co-occurrences specified during training. Instead, the model must have somehow synthesized its own geometry of atomic facts, encoding global relationships between all entities, including non-co-occurring ones. This in turn has simplified a hard reasoning task involving an $ell$-fold composition into an easy-to-learn 1-step geometric task. From this phenomenon, we extract fundamental aspects of neural embedding geometries that are hard to explain. We argue that the rise of such a geometry, despite optimizing over mere local associations, cannot be straightforwardly attributed to typical architectural or optimizational pressures. Counterintuitively, an elegant geometry is learned even when it is not more succinct than a brute-force lookup of associations. Then, by analyzing a connection to Node2Vec, we demonstrate how the geometry stems from a spectral bias that -- in contrast to prevailing theories -- indeed arises naturally despite the lack of various pressures. This analysis also points to practitioners a visible headroom to make Transformer memory more strongly geometric. We hope the geometric view of parametric memory encourages revisiting the default intuitions that guide researchers in areas like knowledge acquisition, capacity, discovery and unlearning.

Problem

Research questions and friction points this paper is trying to address.

Explaining geometric memorization in deep sequence models

Contrasting associative versus geometric memory storage mechanisms

Identifying spectral bias origins in Transformer memory geometry

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformers synthesize geometry of atomic facts

Geometry simplifies reasoning into one-step task

Spectral bias enables geometric memory without pressure

🔎 Similar Papers

Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon