Uniform Mean Estimation for Heavy-Tailed Distributions via Median-of-Means

📅 2025-06-17

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This paper addresses the problem of uniform estimation of means for functions in a class $mathcal{F}$ under heavy-tailed distributions, where observations possess only finite $p$-th moments for $p in (1,2]$, violating the standard second-moment assumption. To tackle this challenge, we propose a unified estimation framework combining the median-of-means (MoM) principle with a novel symmetrization technique. This is the first work to establish a rigorous theory for uniform mean estimation under $(1,2]$-moment conditions. The proposed method significantly enhances robustness in two canonical learning tasks: $k$-means clustering with unbounded inputs and linear regression with generalized loss functions. We derive tight upper bounds on sample complexity—improving upon the best-known results in both settings—while maintaining statistical efficiency and computational feasibility. Our analysis bridges a critical gap between classical empirical process theory and modern robust statistics under minimal moment assumptions.

Technology Category

Application Category

📝 Abstract

The Median of Means (MoM) is a mean estimator that has gained popularity in the context of heavy-tailed data. In this work, we analyze its performance in the task of simultaneously estimating the mean of each function in a class $mathcal{F}$ when the data distribution possesses only the first $p$ moments for $p in (1,2]$. We prove a new sample complexity bound using a novel symmetrization technique that may be of independent interest. Additionally, we present applications of our result to $k$-means clustering with unbounded inputs and linear regression with general losses, improving upon existing works.

Problem

Research questions and friction points this paper is trying to address.

Estimating mean for heavy-tailed distributions efficiently

Analyzing Median-of-Means performance in multi-function mean estimation

Improving k-means clustering and linear regression with unbounded data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Median-of-Means for heavy-tailed distributions

Novel symmetrization technique for estimation

Improved k-means and linear regression applications

🔎 Similar Papers

No similar papers found.

Authors to Follow