🤖 AI Summary
To address the challenge of systematic navigation and analysis posed by the lack of documentation across Hugging Face’s million-scale model repository, this paper introduces the first navigable geographical atlas for large language models. Methodologically, we integrate metadata mining, multidimensional embedding visualization, and structure-aware prior modeling—where the prior is derived from real-world training practices and enables high-confidence inference of undocumented model attributes (e.g., task type, accuracy). We further propose an active mapping strategy to systematically fill atlas gaps and support trend analysis and interactive exploration. Key contributions include: (1) the first open-source Hugging Face Model Atlas; (2) release of the complete dataset, source code, and an interactive web platform; and (3) automated model attribute prediction and quantitative analysis of domain evolution—establishing a novel paradigm for systematic governance and discovery in large-scale model repositories.
📝 Abstract
As there are now millions of publicly available neural networks, searching and analyzing large model repositories becomes increasingly important. Navigating so many models requires an atlas, but as most models are poorly documented charting such an atlas is challenging. To explore the hidden potential of model repositories, we chart a preliminary atlas representing the documented fraction of Hugging Face. It provides stunning visualizations of the model landscape and evolution. We demonstrate several applications of this atlas including predicting model attributes (e.g., accuracy), and analyzing trends in computer vision models. However, as the current atlas remains incomplete, we propose a method for charting undocumented regions. Specifically, we identify high-confidence structural priors based on dominant real-world model training practices. Leveraging these priors, our approach enables accurate mapping of previously undocumented areas of the atlas. We publicly release our datasets, code, and interactive atlas.