🤖 AI Summary
This work addresses the high statistical risk inherent in high-dimensional covariance estimation by reframing covariance shrinkage as a parameterized empirical risk minimization problem based on stochastic interpolation between source and target distributions. The proposed approach extends the theoretical foundations of classical shrinkage estimation through a synergistic integration of optimal transport couplings, eigenvector regularization induced by nonlinear flow maps, an early-stopping mechanism grounded in vector field regression, and an adaptive scheduling strategy. By unifying stochastic interpolation, neural estimators, and quadratic risk upper-bound analysis, the method demonstrates strong empirical performance on synthetic data and achieves superior regularization efficacy and estimation accuracy on real neuroimaging datasets.
📝 Abstract
We recast classical shrinkage of high-dimensional covariance estimators as empirical risk minimization over a parametric stochastic interpolant between a source and a target distribution. This formalism recovers known shrinkage estimators as special cases and reveals three distinct mechanisms for reducing statistical risk: (i) Scheduling: the interpolant schedule determines the class of admissible covariances, and hence the achievable risk. (ii) Flow maps and couplings: whereas naive constructions amount to assuming independence between the distributions, specific coupling structures (e.g., solutions of optimal transport problems) can lower the empirical risk. Moreover, non-linear flow maps realizing such couplings free the interpolant covariance from the eigenbasis of the empirical estimate, enabling eigenvector regularization. (iii) Early stopping: estimators defined by integrating a regressed vector field afford an additional bias-variance trade-off through approximation of the true interpolant distribution. We then propose a neural estimator of the interpolant, together with an upper bound on its quadratic risk in terms of the interpolant approximation error, and validate both on synthetic experiments. Finally, we apply the estimator to real neuroimaging data, demonstrating the additional regularization power this approach offers in practice.