Provable Data Scaling Law for Meta Learning via Complexity Minimization

📅 2026-06-01

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This work addresses the inadequacy of existing pretraining theories in explaining the pronounced reduction in downstream sample complexity observed when scaling up pretraining data. To bridge this gap, the authors propose a complexity minimization framework within meta-learning that characterizes data-scaling laws end-to-end by evaluating the optimal model complexity across domains and minimizing the worst-case complexity over source domains. The study establishes, for the first time, a provable scaling law linking pretraining data volume to few-shot adaptation error on downstream tasks, positioning complexity minimization as a new principle for meta-representation learning. Theoretical analysis demonstrates that downstream error decreases with increasing meta-training data, and experiments confirm that complexity regularization consistently enhances the sample efficiency of existing meta-learning methods.

📝 Abstract

Pre-training has become a fundamental paradigm in modern machine learning, with one of its key empirical benefits being reduced downstream sample complexity as the scale of pre-training data increases. However, existing theoretical frameworks for pre-training do not fully explain this phenomenon. In this paper, we introduce complexity minimization, a novel meta-representation learning framework designed to enable theoretical analysis of this scaling behavior, which learns representations by evaluating the downstream model complexity best suited to each domain and minimizing the worst-case such complexity across source domains. Our end-to-end theoretical analysis, spanning pre-training through downstream regression, shows that this framework provably captures this scaling behavior; in particular, we show that the error rate of few-shot adaptation improves as the amount of meta-training data grows. Empirically, we demonstrate that incorporating complexity regularization into existing meta-learning methods consistently improves downstream sample efficiency.

Problem

Research questions and friction points this paper is trying to address.

meta-learning

data scaling law

pre-training

sample complexity

representation learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

complexity minimization

meta-learning

data scaling law