Learning Multi-Index Models with Hyper-Kernel Ridge Regression

📅 2025-10-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper investigates whether the compositional structure of learning tasks underlies the superiority of deep networks over classical models such as kernel methods. To formalize compositional structure, we introduce the Multi-Index Model (MIM) as a minimal yet expressive benchmark. We then propose Hyperkernel Ridge Regression (HKRR), the first kernel-based method adaptively extended to MIM learning, which integrates neural networks’ structural inductive biases with kernel methods’ theoretical interpretability. We design specialized optimization algorithms based on alternating minimization and alternating gradients, and derive a tight sample complexity upper bound for HKRR. Theoretically and empirically, HKRR significantly outperforms standard kernel methods on MIM tasks, effectively mitigating the curse of dimensionality. Our results elucidate how structural priors enhance high-dimensional learning performance and offer a novel perspective on the origins of deep learning’s empirical success.

Technology Category

Application Category

📝 Abstract
Deep neural networks excel in high-dimensional problems, outperforming models such as kernel methods, which suffer from the curse of dimensionality. However, the theoretical foundations of this success remain poorly understood. We follow the idea that the compositional structure of the learning task is the key factor determining when deep networks outperform other approaches. Taking a step towards formalizing this idea, we consider a simple compositional model, namely the multi-index model (MIM). In this context, we introduce and study hyper-kernel ridge regression (HKRR), an approach blending neural networks and kernel methods. Our main contribution is a sample complexity result demonstrating that HKRR can adaptively learn MIM, overcoming the curse of dimensionality. Further, we exploit the kernel nature of the estimator to develop ad hoc optimization approaches. Indeed, we contrast alternating minimization and alternating gradient methods both theoretically and numerically. These numerical results complement and reinforce our theoretical findings.
Problem

Research questions and friction points this paper is trying to address.

Learning multi-index models with hyper-kernel ridge regression
Overcoming the curse of dimensionality in kernel methods
Theoretical analysis of deep networks versus kernel methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hyper-kernel ridge regression blends neural networks and kernels
Adaptively learns multi-index models overcoming dimensionality curse
Uses alternating minimization and gradient methods for optimization
🔎 Similar Papers
No similar papers found.
S
Shuo Huang
Istituto Italiano di Tecnologia, Genoa, Italy
H
Hippolyte Labarrière
MaLGa - DIBRIS - Università di Genova, Genoa, Italy
Ernesto De Vito
Ernesto De Vito
MaLGa Machine Learning Genoa Center - Università di Genova
Signal analysismachine learningharmonic analysisprobability
T
Tomaso Poggio
CBMM - Massachusetts Institute of Technology, Cambridge, MA, USA
Lorenzo Rosasco
Lorenzo Rosasco
MaLGa Machine Learning Genoa Center - Università degli Studi di Genova
learning theorymachine learning