🤖 AI Summary
Evaluating and comparing large autoregressive language models remains computationally prohibitive due to their scale and complexity.
Method: This paper proposes representing models as log-likelihood vectors over a fixed, predefined text corpus. It introduces the squared Euclidean distance in this vector space as a scalable approximation of KL divergence—formally justified via cross-entropy loss minimization—and constructs an interpretable, computationally tractable model coordinate system.
Contribution/Results: Leveraging this framework, we present the first unified “language model map” encompassing 1,000+ open-source models. The map systematically reveals capability distributions, familial clustering, and evolutionary trajectories across architectures and training regimes. Theoretically grounded (distance approximates KL divergence) and empirically scalable (linear time complexity in model count), our approach establishes a new paradigm for efficient large-model evaluation, selection, and mechanistic analysis.
📝 Abstract
To compare autoregressive language models at scale, we propose using log-likelihood vectors computed on a predefined text set as model features. This approach has a solid theoretical basis: when treated as model coordinates, their squared Euclidean distance approximates the Kullback-Leibler divergence of text-generation probabilities. Our method is highly scalable, with computational cost growing linearly in both the number of models and text samples, and is easy to implement as the required features are derived from cross-entropy loss. Applying this method to over 1,000 language models, we constructed a"model map,"providing a new perspective on large-scale model analysis.