🤖 AI Summary
This paper addresses the problem of uniform estimation of means for functions in a class $mathcal{F}$ under heavy-tailed distributions, where observations possess only finite $p$-th moments for $p in (1,2]$, violating the standard second-moment assumption. To tackle this challenge, we propose a unified estimation framework combining the median-of-means (MoM) principle with a novel symmetrization technique. This is the first work to establish a rigorous theory for uniform mean estimation under $(1,2]$-moment conditions. The proposed method significantly enhances robustness in two canonical learning tasks: $k$-means clustering with unbounded inputs and linear regression with generalized loss functions. We derive tight upper bounds on sample complexity—improving upon the best-known results in both settings—while maintaining statistical efficiency and computational feasibility. Our analysis bridges a critical gap between classical empirical process theory and modern robust statistics under minimal moment assumptions.
📝 Abstract
The Median of Means (MoM) is a mean estimator that has gained popularity in the context of heavy-tailed data. In this work, we analyze its performance in the task of simultaneously estimating the mean of each function in a class $mathcal{F}$ when the data distribution possesses only the first $p$ moments for $p in (1,2]$. We prove a new sample complexity bound using a novel symmetrization technique that may be of independent interest. Additionally, we present applications of our result to $k$-means clustering with unbounded inputs and linear regression with general losses, improving upon existing works.