IdEst: Assessing Self-Supervised Learning Representations via Intrinsic Dimension

📅 2026-06-02

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This work addresses the limitations of current self-supervised representation evaluation, which relies heavily on linear probing—a computationally expensive and hyperparameter-sensitive approach that fails to reveal the geometric structure of representation spaces. To overcome these issues, the paper introduces IdEst, a novel method that leverages intrinsic dimensionality as an unsupervised and efficient proxy for representation quality. Specifically, IdEst employs the minimum spanning tree–based dimension estimator (dim_MST) to quantify the intrinsic dimension of learned representations. Extensive experiments across diverse datasets, network architectures, and self-supervised objectives demonstrate that IdEst correlates strongly with linear probing performance while drastically reducing evaluation cost. This establishes intrinsic dimensionality as a geometrically grounded, scalable alternative, offering a new paradigm for evaluating self-supervised learning representations.

📝 Abstract

Self-supervised learning (SSL) has emerged as a powerful paradigm for learning meaningful representations from unlabeled data. However, the standard protocol for evaluating these representations, linear probing, is computationally expensive, sensitive to hyperparameters, and provides limited insight into the geometric structure of the representation space. In this work, motivated by connections between neural network generalization and intrinsic dimension (ID) we propose IdEst, a method for estimating the ID of SSL representations via the Minimum Spanning Tree dimension estimator ($\mathrm{dim}_\mathrm{MST}$). Across diverse datasets, architectures, and SSL pretraining objectives, we show that IdEst strongly correlates with downstream linear probe performances. Furthermore, we demonstrate that IdEst enables efficient hyperparameter selection, significantly reducing the computational cost compared to supervised alternatives. Our results highlight intrinsic dimensionality as a principled geometric proxy for assessing SSL representations, complementing standard supervised probing protocols.

Problem

Research questions and friction points this paper is trying to address.

self-supervised learning

representation evaluation

intrinsic dimension

linear probing

geometric structure

Innovation

Methods, ideas, or system contributions that make the work stand out.

intrinsic dimension

self-supervised learning

representation evaluation