The Geometry of Numerical Reasoning: Language Models Compare Numeric Properties in Linear Subspaces

📅 2024-10-17

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work investigates whether large language models (LLMs) explicitly encode numerical attributes—such as birth years—in low-dimensional linear subspaces of their embedding space to support numerical comparison reasoning (e.g., “Was Cristiano Ronaldo born before Lionel Messi?”). We propose a method based on partial least squares regression to identify such numerical subspaces, combined with targeted hidden-state interventions and controlled ablation experiments. Our key contribution is the first empirical demonstration that mainstream LLMs linearly encode numerical information in dedicated low-dimensional subspaces. Crucially, we establish causal necessity: targeted intervention within these subspaces reliably flips model predictions on comparison tasks, inducing accuracy fluctuations exceeding 85%. These findings uncover the geometric underpinnings of numerical reasoning in LLMs and provide novel evidence for how symbolic numerical semantics map onto geometric representations in transformer-based models.

Technology Category

Application Category

📝 Abstract

This paper investigates whether large language models (LLMs) utilize numerical attributes encoded in a low-dimensional subspace of the embedding space when answering questions involving numeric comparisons, e.g., Was Cristiano born before Messi? We first identified, using partial least squares regression, these subspaces, which effectively encode the numerical attributes associated with the entities in comparison prompts. Further, we demonstrate causality, by intervening in these subspaces to manipulate hidden states, thereby altering the LLM's comparison outcomes. Experiments conducted on three different LLMs showed that our results hold across different numerical attributes, indicating that LLMs utilize the linearly encoded information for numerical reasoning.

Problem

Research questions and friction points this paper is trying to address.

LLMs use low-dimensional subspaces

Encode numerical attributes effectively

Manipulate hidden states causally

Innovation

Methods, ideas, or system contributions that make the work stand out.

Low-dimensional subspace encoding

Partial least squares regression

Causality via subspace intervention

🔎 Similar Papers

No similar papers found.

Authors to Follow