Uniform Mean Estimation for Heavy-Tailed Distributions via Median-of-Means

📅 2025-06-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the problem of uniform estimation of means for functions in a class $mathcal{F}$ under heavy-tailed distributions, where observations possess only finite $p$-th moments for $p in (1,2]$, violating the standard second-moment assumption. To tackle this challenge, we propose a unified estimation framework combining the median-of-means (MoM) principle with a novel symmetrization technique. This is the first work to establish a rigorous theory for uniform mean estimation under $(1,2]$-moment conditions. The proposed method significantly enhances robustness in two canonical learning tasks: $k$-means clustering with unbounded inputs and linear regression with generalized loss functions. We derive tight upper bounds on sample complexity—improving upon the best-known results in both settings—while maintaining statistical efficiency and computational feasibility. Our analysis bridges a critical gap between classical empirical process theory and modern robust statistics under minimal moment assumptions.

Technology Category

Application Category

📝 Abstract
The Median of Means (MoM) is a mean estimator that has gained popularity in the context of heavy-tailed data. In this work, we analyze its performance in the task of simultaneously estimating the mean of each function in a class $mathcal{F}$ when the data distribution possesses only the first $p$ moments for $p in (1,2]$. We prove a new sample complexity bound using a novel symmetrization technique that may be of independent interest. Additionally, we present applications of our result to $k$-means clustering with unbounded inputs and linear regression with general losses, improving upon existing works.
Problem

Research questions and friction points this paper is trying to address.

Estimating mean for heavy-tailed distributions efficiently
Analyzing Median-of-Means performance in multi-function mean estimation
Improving k-means clustering and linear regression with unbounded data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Median-of-Means for heavy-tailed distributions
Novel symmetrization technique for estimation
Improved k-means and linear regression applications
🔎 Similar Papers
No similar papers found.