🤖 AI Summary
This study addresses the challenges of high-dimensional sufficient dimension reduction, which is often hindered by the curse of dimensionality and prohibitive computational costs. The authors reformulate the Minimum Average Variance Estimation (MAVE) framework as a smooth optimization problem on the Stiefel manifold, thereby unifying the MAVE objective with Grassmann geometry for the first time. They further introduce a neighborhood-localized strategy within a sparse projection space to enhance estimation efficiency. Building on this foundation, they propose SMAVE, a Riemannian stochastic gradient ascent algorithm that enjoys almost sure convergence and achieves non-asymptotically optimal rates. Empirical evaluations demonstrate that SMAVE accurately recovers the central subspace in moderate- to high-dimensional synthetic settings and significantly outperforms OPG—and matches or exceeds RMAVE—on four real-world datasets, while achieving speedups of several orders of magnitude.
📝 Abstract
Sufficient dimension reduction (SDR) makes high-dimensional regression tractable by projecting the covariates onto a low-dimensional subspace that preserves the conditional mean of the response. Existing gradient-based estimators either operate in the ambient space and suffer from the curse of dimensionality, or localize in the reduced space at a per-outer-iteration cost at least quadratic in the sample size. We show that minimizers of the population Minimum Average Variance Estimation (MAVE) risk approximate the same Grassmannian target as the Outer Product of Gradients (OPG), and recast the empirical criterion as a smooth maximization on the Stiefel manifold with closed-form Riemannian gradient. The resulting algorithm, SMAVE, combines sparse projected-space nearest-neighbor localization with Riemannian stochastic gradient ascent. A simplified version comes with almost-sure convergence and a non-asymptotic rate matching the standard non-convex stochastic first-order scaling. Empirically, SMAVE matches or improves on RMAVE's synthetic subspace recovery at moderate-to-high ambient dimension, and on four real datasets it uniformly improves over OPG and is competitive with or outperforms RMAVE at orders of magnitude lower runtime.