Measuring Intrinsic Dimension of Token Embeddings

📅 2025-03-04

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This work investigates the intrinsic dimension (ID) of token embedding spaces in quantized language models to characterize representational redundancy and examine ID’s role in model scaling, training dynamics, and efficient fine-tuning. We propose a systematic ID estimation framework based on the TwoNN algorithm, enabling the first multi-scale characterization of ID evolution across embedding layers of increasingly large models. We find that ID is substantially lower than the extrinsic embedding dimension, converges rapidly during early training, and exhibits strong correlation with the optimal rank of Low-Rank Adaptation (LoRA). Crucially, setting the LoRA rank to the estimated ID yields significant perplexity reduction, demonstrating that ID serves as an interpretable and transferable hyperparameter for efficient adaptation. Our results provide both theoretical insight into the low-dimensional structure of large language model representations and practical guidance for designing parameter-efficient fine-tuning methods.

Technology Category

Application Category

📝 Abstract

In this study, we measure the Intrinsic Dimension (ID) of token embedding to estimate the intrinsic dimensions of the manifolds spanned by the representations, so as to evaluate their redundancy quantitatively compared to their extrinsic dimensionality. In detail, (1) we estimate the ID of token embeddings in small-scale language models and also modern large language models, finding that the embedding spaces often reside on lower-dimensional manifolds compared to their extrinsic dimensionality; (2) we measure the ID across various model sizes and observe an increase in redundancy rates as the model scale grows; (3) we measure the dynamics of IDs during the training process, and find a rapid ID drop in the early stages of training. Moreover, (4) when LoRA is applied to the embedding layers, we observe a sudden drop in perplexity around the estimated IDs, suggesting that the ID can serve as a useful guideline for LoRA application.

Problem

Research questions and friction points this paper is trying to address.

Measure Intrinsic Dimension of token embeddings in language models.

Analyze redundancy rates across different model sizes and training stages.

Evaluate ID as a guideline for applying LoRA to embedding layers.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Measure Intrinsic Dimension of token embeddings

Analyze ID across various model sizes

Use ID as guideline for LoRA application

🔎 Similar Papers

No similar papers found.