The Geometry of Numerical Reasoning: Language Models Compare Numeric Properties in Linear Subspaces

📅 2024-10-17
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates whether large language models (LLMs) explicitly encode numerical attributes—such as birth years—in low-dimensional linear subspaces of their embedding space to support numerical comparison reasoning (e.g., “Was Cristiano Ronaldo born before Lionel Messi?”). We propose a method based on partial least squares regression to identify such numerical subspaces, combined with targeted hidden-state interventions and controlled ablation experiments. Our key contribution is the first empirical demonstration that mainstream LLMs linearly encode numerical information in dedicated low-dimensional subspaces. Crucially, we establish causal necessity: targeted intervention within these subspaces reliably flips model predictions on comparison tasks, inducing accuracy fluctuations exceeding 85%. These findings uncover the geometric underpinnings of numerical reasoning in LLMs and provide novel evidence for how symbolic numerical semantics map onto geometric representations in transformer-based models.

Technology Category

Application Category

📝 Abstract
This paper investigates whether large language models (LLMs) utilize numerical attributes encoded in a low-dimensional subspace of the embedding space when answering questions involving numeric comparisons, e.g., Was Cristiano born before Messi? We first identified, using partial least squares regression, these subspaces, which effectively encode the numerical attributes associated with the entities in comparison prompts. Further, we demonstrate causality, by intervening in these subspaces to manipulate hidden states, thereby altering the LLM's comparison outcomes. Experiments conducted on three different LLMs showed that our results hold across different numerical attributes, indicating that LLMs utilize the linearly encoded information for numerical reasoning.
Problem

Research questions and friction points this paper is trying to address.

LLMs use low-dimensional subspaces
Encode numerical attributes effectively
Manipulate hidden states causally
Innovation

Methods, ideas, or system contributions that make the work stand out.

Low-dimensional subspace encoding
Partial least squares regression
Causality via subspace intervention
🔎 Similar Papers
No similar papers found.