🤖 AI Summary
Large language models (LLMs) exhibit opaque decision-making processes, hindering interpretability and mechanistic understanding.
Method: We systematically analyze the geometric evolution of hidden representations across layers of 28 open-source Transformer models on multiple-choice question answering (MCQA), quantifying intrinsic dimensionality (ID) layerwise using multiple ID estimators.
Results: We identify a consistent three-phase ID dynamic—“low-dimensional → expansion → compression”—across models and datasets: early layers rapidly project inputs onto task-relevant low-dimensional manifolds; middle layers expand representations to support reasoning; late layers compress them into discriminative subspaces. This universal geometric pattern provides the first empirical evidence of a shared structural basis underlying LLM generalization and emergent reasoning capabilities. It establishes a verifiable, structured interpretability framework for probing implicit decision mechanisms in LLMs.
📝 Abstract
Large Language Models (LLMs) show strong generalization across diverse tasks, yet the internal decision-making processes behind their predictions remain opaque. In this work, we study the geometry of hidden representations in LLMs through the lens of extit{intrinsic dimension} (ID), focusing specifically on decision-making dynamics in a multiple-choice question answering (MCQA) setting. We perform a large-scale study, with 28 open-weight transformer models and estimate ID across layers using multiple estimators, while also quantifying per-layer performance on MCQA tasks. Our findings reveal a consistent ID pattern across models: early layers operate on low-dimensional manifolds, middle layers expand this space, and later layers compress it again, converging to decision-relevant representations. Together, these results suggest LLMs implicitly learn to project linguistic inputs onto structured, low-dimensional manifolds aligned with task-specific decisions, providing new geometric insights into how generalization and reasoning emerge in language models.