Fundamental Limits of Learning High-dimensional Simplices in Noisy Regimes

📅 2025-06-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper investigates the fundamental learnability limits of high-dimensional simplices under unknown Gaussian noise: given noisy samples drawn uniformly from an unknown $K$-dimensional simplex, we characterize the minimal sample complexity required for $varepsilon$-accurate estimation. We propose a novel Fourier-analytic approach to distribution recovery, establishing for the first time that when the signal-to-noise ratio satisfies $mathrm{SNR} geq Omega(K^{1/2})$, the sample complexity matches that of the noiseless setting. Leveraging sample compression, information-theoretic lower bounds, and total variation/$ell_2$-distance analysis, we derive tight upper and lower bounds: $n geq (K^2/varepsilon^2) exp(O(K/mathrm{SNR}^2))$ and $n geq Omega(K^3sigma^2/varepsilon^2 + K/varepsilon)$, achieving matching rates. Our core contribution is uncovering the fundamental role of $mathrm{SNR}$ in learning high-dimensional geometric structures, and providing the first noise-robust estimation framework that is both theoretically optimal and computationally feasible.

Technology Category

Application Category

📝 Abstract
In this paper, we establish sample complexity bounds for learning high-dimensional simplices in $mathbb{R}^K$ from noisy data. Specifically, we consider $n$ i.i.d. samples uniformly drawn from an unknown simplex in $mathbb{R}^K$, each corrupted by additive Gaussian noise of unknown variance. We prove an algorithm exists that, with high probability, outputs a simplex within $ell_2$ or total variation (TV) distance at most $varepsilon$ from the true simplex, provided $n ge (K^2/varepsilon^2) e^{mathcal{O}(K/mathrm{SNR}^2)}$, where $mathrm{SNR}$ is the signal-to-noise ratio. Extending our prior work~citep{saberi2023sample}, we derive new information-theoretic lower bounds, showing that simplex estimation within TV distance $varepsilon$ requires at least $n ge Omega(K^3 sigma^2/varepsilon^2 + K/varepsilon)$ samples, where $sigma^2$ denotes the noise variance. In the noiseless scenario, our lower bound $n ge Omega(K/varepsilon)$ matches known upper bounds up to constant factors. We resolve an open question by demonstrating that when $mathrm{SNR} ge Omega(K^{1/2})$, noisy-case complexity aligns with the noiseless case. Our analysis leverages sample compression techniques (Ashtiani et al., 2018) and introduces a novel Fourier-based method for recovering distributions from noisy observations, potentially applicable beyond simplex learning.
Problem

Research questions and friction points this paper is trying to address.

Estimate high-dimensional simplices from noisy data
Determine sample complexity bounds for simplex learning
Analyze impact of noise on simplex estimation accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sample compression techniques for noisy data
Fourier-based method for distribution recovery
Information-theoretic bounds for simplex estimation
🔎 Similar Papers
No similar papers found.
S
Seyed Amir Hossein Saberi
Department of Electerical Engineering, Sharif university of Technology, Tehran, Iran
Amir Najafi
Amir Najafi
imec, Belgium
SOC designUltra-low-power on-chip communicationEnergy-efficient architectures
A
Abolfazl Motahari
Department of Computer Engineering, Sharif university of Technology, Tehran, Iran
B
B. Khalaj
Department of Electerical Engineering, Sharif university of Technology, Tehran, Iran