Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics

📅 2024-08-14
🏛️ arXiv.org
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses learning in high-dimensional multi-index models under the assumption that data exhibit an underlying low-dimensional structure, aiming to control both sample and computational complexity. We propose a mean-field Langevin dynamics algorithm instantiated with a two-layer neural network, where weights are constrained to a compact Riemannian manifold of positive Ricci curvature—such as the unit hypersphere. Theoretically, we establish the first characterization of an effective dimension dictated by the intrinsic data geometry; sample complexity scales nearly linearly in this dimension. Under the curvature constraint, the algorithm achieves polynomial-time convergence—overcoming the exponential convergence barrier inherent in Euclidean-space optimization. Our key contribution lies in unifying tools from differential geometry (Ricci curvature analysis), high-dimensional statistical learning, and mean-field optimization, yielding a new framework for structured high-dimensional nonconvex learning that simultaneously ensures statistical efficiency and computational tractability.

Technology Category

Application Category

📝 Abstract
We study the problem of learning multi-index models in high-dimensions using a two-layer neural network trained with the mean-field Langevin algorithm. Under mild distributional assumptions on the data, we characterize the effective dimension $d_{mathrm{eff}}$ that controls both sample and computational complexity by utilizing the adaptivity of neural networks to latent low-dimensional structures. When the data exhibit such a structure, $d_{mathrm{eff}}$ can be significantly smaller than the ambient dimension. We prove that the sample complexity grows almost linearly with $d_{mathrm{eff}}$, bypassing the limitations of the information and generative exponents that appeared in recent analyses of gradient-based feature learning. On the other hand, the computational complexity may inevitably grow exponentially with $d_{mathrm{eff}}$ in the worst-case scenario. Motivated by improving computational complexity, we take the first steps towards polynomial time convergence of the mean-field Langevin algorithm by investigating a setting where the weights are constrained to be on a compact manifold with positive Ricci curvature, such as the hypersphere. There, we study assumptions under which polynomial time convergence is achievable, whereas similar assumptions in the Euclidean setting lead to exponential time complexity.
Problem

Research questions and friction points this paper is trying to address.

Learning high-dimensional multi-index models with neural networks
Characterizing effective dimension for sample and computational complexity
Improving computational complexity via constrained weight optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-layer neural network with mean-field Langevin
Adaptive learning of low-dimensional structures
Polynomial time convergence on compact manifolds
🔎 Similar Papers
No similar papers found.