๐ค AI Summary
This work addresses the degradation of convergence rates in meta-learning under nonlinear shared representationsโa limitation unaddressed by existing theory, which assumes linear representations. In practice, nonlinear features induce non-averagable bias, causing convergence acceleration to vanish as the number of tasks (N) increases. We propose the first theoretical framework for meta-learning in infinite-dimensional reproducing kernel Hilbert spaces (RKHS), jointly leveraging task-wise regularization and smoothness-driven bias control to model nonlinear shared representations. We establish, for the first time, that appropriate regularization can fully mitigate the adverse impact of nonlinear bias: the convergence rate improves significantly with (N), not only recovering meta-learning acceleration but also strictly outperforming the linear-representation baseline. This result breaks the long-standing reliance on linearity assumptions in meta-learning theory and provides foundational theoretical support for deep meta-learning.
๐ Abstract
Many recent theoretical works on meta-learning aim to achieve guarantees in leveraging similar representational structures from related tasks towards simplifying a target task. Importantly, the main aim in theory works on the subject is to understand the extent to which convergence rates -- in learning a common representation -- may scale with the number $N$ of tasks (as well as the number of samples per task). First steps in this setting demonstrate this property when both the shared representation amongst tasks, and task-specific regression functions, are linear. This linear setting readily reveals the benefits of aggregating tasks, e.g., via averaging arguments. In practice, however, the representation is often highly nonlinear, introducing nontrivial biases in each task that cannot easily be averaged out as in the linear case. In the present work, we derive theoretical guarantees for meta-learning with nonlinear representations. In particular, assuming the shared nonlinearity maps to an infinite-dimensional RKHS, we show that additional biases can be mitigated with careful regularization that leverages the smoothness of task-specific regression functions,