🤖 AI Summary
This paper investigates the statistical and computational complexity of estimating the unknown projection direction (w^*) in the single-index model (SIM), where the input (x in mathbb{R}^d) follows an arbitrary spherically symmetric distribution. Methodologically, it introduces spherical harmonics as the natural orthogonal basis for SIM—departing from classical Hermite-based analyses—and leverages tensor expansions and rotation-invariant modeling. Theoretically, it establishes a unified characterization of fundamental learning limits across all spherically symmetric distributions. It proposes two complementary estimators: one achieving optimal sample complexity, the other optimal runtime, with a rigorous proof of their inherent trade-off. In the Gaussian case, it recovers and strengthens prior results while uncovering previously overlooked non-asymptotic phase transitions. Altogether, this work delivers the first systematic complexity theory for SIM under general spherical symmetry—integrating spherical harmonic analysis, spectral methods, and invariant modeling to bridge statistical optimality and computational feasibility.
📝 Abstract
We study the problem of learning single-index models, where the label $y in mathbb{R}$ depends on the input $oldsymbol{x} in mathbb{R}^d$ only through an unknown one-dimensional projection $langle oldsymbol{w}_*,oldsymbol{x}
angle$. Prior work has shown that under Gaussian inputs, the statistical and computational complexity of recovering $oldsymbol{w}_*$ is governed by the Hermite expansion of the link function. In this paper, we propose a new perspective: we argue that"spherical harmonics"-- rather than"Hermite polynomials"-- provide the natural basis for this problem, as they capture its intrinsic"rotational symmetry". Building on this insight, we characterize the complexity of learning single-index models under arbitrary spherically symmetric input distributions. We introduce two families of estimators -- based on tensor unfolding and online SGD -- that respectively achieve either optimal sample complexity or optimal runtime, and argue that estimators achieving both may not exist in general. When specialized to Gaussian inputs, our theory not only recovers and clarifies existing results but also reveals new phenomena that had previously been overlooked.