Fundamental Limits of Learning High-dimensional Simplices in Noisy Regimes

📅 2025-06-11

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This paper investigates the fundamental learnability limits of high-dimensional simplices under unknown Gaussian noise: given noisy samples drawn uniformly from an unknown $K$-dimensional simplex, we characterize the minimal sample complexity required for $varepsilon$-accurate estimation. We propose a novel Fourier-analytic approach to distribution recovery, establishing for the first time that when the signal-to-noise ratio satisfies $mathrm{SNR} geq Omega(K^{1/2})$, the sample complexity matches that of the noiseless setting. Leveraging sample compression, information-theoretic lower bounds, and total variation/$ell_2$-distance analysis, we derive tight upper and lower bounds: $n geq (K^2/varepsilon^2) exp(O(K/mathrm{SNR}^2))$ and $n geq Omega(K^3sigma^2/varepsilon^2 + K/varepsilon)$, achieving matching rates. Our core contribution is uncovering the fundamental role of $mathrm{SNR}$ in learning high-dimensional geometric structures, and providing the first noise-robust estimation framework that is both theoretically optimal and computationally feasible.

Technology Category

Application Category

📝 Abstract

In this paper, we establish sample complexity bounds for learning high-dimensional simplices in $mathbb{R}^K$ from noisy data. Specifically, we consider $n$ i.i.d. samples uniformly drawn from an unknown simplex in $mathbb{R}^K$, each corrupted by additive Gaussian noise of unknown variance. We prove an algorithm exists that, with high probability, outputs a simplex within $ell_2$ or total variation (TV) distance at most $varepsilon$ from the true simplex, provided $n ge (K^2/varepsilon^2) e^{mathcal{O}(K/mathrm{SNR}^2)}$, where $mathrm{SNR}$ is the signal-to-noise ratio. Extending our prior work~citep{saberi2023sample}, we derive new information-theoretic lower bounds, showing that simplex estimation within TV distance $varepsilon$ requires at least $n ge Omega(K^3 sigma^2/varepsilon^2 + K/varepsilon)$ samples, where $sigma^2$ denotes the noise variance. In the noiseless scenario, our lower bound $n ge Omega(K/varepsilon)$ matches known upper bounds up to constant factors. We resolve an open question by demonstrating that when $mathrm{SNR} ge Omega(K^{1/2})$, noisy-case complexity aligns with the noiseless case. Our analysis leverages sample compression techniques (Ashtiani et al., 2018) and introduces a novel Fourier-based method for recovering distributions from noisy observations, potentially applicable beyond simplex learning.

Problem

Research questions and friction points this paper is trying to address.

Estimate high-dimensional simplices from noisy data

Determine sample complexity bounds for simplex learning

Analyze impact of noise on simplex estimation accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sample compression techniques for noisy data

Fourier-based method for distribution recovery

Information-theoretic bounds for simplex estimation

🔎 Similar Papers

No similar papers found.

Authors to Follow