A Connection Between Score Matching and Local Intrinsic Dimension

📅 2025-10-14

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

This work addresses the long-standing challenge of estimating Local Intrinsic Dimensionality (LID) in high-dimensional complex data. We establish, for the first time, a theoretical connection between denoising score-matching loss and LID: specifically, we prove that this loss constitutes a tight lower bound on LID and unifies implicit score matching with the FLIPD framework. Leveraging this insight, we propose an efficient, scalable LID estimation algorithm that requires only standard score estimation and density gradient analysis from pretrained diffusion models—without explicit manifold modeling or auxiliary network training. Evaluated on synthetic manifold benchmarks and Stable Diffusion 3.5, our method achieves superior accuracy, computational efficiency, and memory footprint compared to state-of-the-art approaches. It scales effectively to large-scale datasets and quantization-aware scenarios, offering a lightweight, theoretically grounded paradigm for geometric analysis of high-dimensional data.

Technology Category

Application Category

📝 Abstract

The local intrinsic dimension (LID) of data is a fundamental quantity in signal processing and learning theory, but quantifying the LID of high-dimensional, complex data has been a historically challenging task. Recent works have discovered that diffusion models capture the LID of data through the spectra of their score estimates and through the rate of change of their density estimates under various noise perturbations. While these methods can accurately quantify LID, they require either many forward passes of the diffusion model or use of gradient computation, limiting their applicability in compute- and memory-constrained scenarios. We show that the LID is a lower bound on the denoising score matching loss, motivating use of the denoising score matching loss as a LID estimator. Moreover, we show that the equivalent implicit score matching loss also approximates LID via the normal dimension and is closely related to a recent LID estimator, FLIPD. Our experiments on a manifold benchmark and with Stable Diffusion 3.5 indicate that the denoising score matching loss is a highly competitive and scalable LID estimator, achieving superior accuracy and memory footprint under increasing problem size and quantization level.

Problem

Research questions and friction points this paper is trying to address.

Estimating local intrinsic dimension of complex high-dimensional data

Overcoming computational limitations of existing LID quantification methods

Establishing connection between score matching loss and LID estimation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Denoising score matching loss estimates local intrinsic dimension

Implicit score matching loss approximates LID via normal dimension

Method achieves superior accuracy with reduced memory footprint

🔎 Similar Papers

Intrinsic Dimension Correlation: uncovering nonlinear connections in multimodal representations