🤖 AI Summary
This work investigates the intrinsic dimension (ID) of token embedding spaces in quantized language models to characterize representational redundancy and examine ID’s role in model scaling, training dynamics, and efficient fine-tuning. We propose a systematic ID estimation framework based on the TwoNN algorithm, enabling the first multi-scale characterization of ID evolution across embedding layers of increasingly large models. We find that ID is substantially lower than the extrinsic embedding dimension, converges rapidly during early training, and exhibits strong correlation with the optimal rank of Low-Rank Adaptation (LoRA). Crucially, setting the LoRA rank to the estimated ID yields significant perplexity reduction, demonstrating that ID serves as an interpretable and transferable hyperparameter for efficient adaptation. Our results provide both theoretical insight into the low-dimensional structure of large language model representations and practical guidance for designing parameter-efficient fine-tuning methods.
📝 Abstract
In this study, we measure the Intrinsic Dimension (ID) of token embedding to estimate the intrinsic dimensions of the manifolds spanned by the representations, so as to evaluate their redundancy quantitatively compared to their extrinsic dimensionality. In detail, (1) we estimate the ID of token embeddings in small-scale language models and also modern large language models, finding that the embedding spaces often reside on lower-dimensional manifolds compared to their extrinsic dimensionality; (2) we measure the ID across various model sizes and observe an increase in redundancy rates as the model scale grows; (3) we measure the dynamics of IDs during the training process, and find a rapid ID drop in the early stages of training. Moreover, (4) when LoRA is applied to the embedding layers, we observe a sudden drop in perplexity around the estimated IDs, suggesting that the ID can serve as a useful guideline for LoRA application.