Analysis of Overparameterization in Continual Learning under a Linear Model

📅 2025-02-11

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This work investigates the intrinsic mitigating effect of overparameterization on catastrophic forgetting in continual learning. Without explicit anti-forgetting mechanisms—such as regularization or experience replay—we analyze linear regression models under gradient descent dynamics and, for the first time, establish a non-asymptotic theoretical guarantee that overparameterization alone significantly alleviates forgetting. We derive a tight non-asymptotic upper bound on the single-task linear regression risk, offering a novel explanation for the double-descent phenomenon. Extending to a two-task permutation setting, we prove that high overparameterization ensures low risk on the first task after sequential training. Our theoretical results are both tight and empirically verifiable, revealing a fundamental trade-off between model capacity and robustness in continual learning.

Technology Category

Application Category

📝 Abstract

Autonomous machine learning systems that learn many tasks in sequence are prone to the catastrophic forgetting problem. Mathematical theory is needed in order to understand the extent of forgetting during continual learning. As a foundational step towards this goal, we study continual learning and catastrophic forgetting from a theoretical perspective in the simple setting of gradient descent with no explicit algorithmic mechanism to prevent forgetting. In this setting, we analytically demonstrate that overparameterization alone can mitigate forgetting in the context of a linear regression model. We consider a two-task setting motivated by permutation tasks, and show that as the overparameterization ratio becomes sufficiently high, a model trained on both tasks in sequence results in a low-risk estimator for the first task. As part of this work, we establish a non-asymptotic bound of the risk of a single linear regression task, which may be of independent interest to the field of double descent theory.

Problem

Research questions and friction points this paper is trying to address.

Mitigate catastrophic forgetting in continual learning

Study overparameterization in linear regression models

Establish risk bounds in double descent theory

Innovation

Methods, ideas, or system contributions that make the work stand out.

Overparameterization mitigates forgetting

Linear regression model analysis

Non-asymptotic risk bound established

🔎 Similar Papers

Theoretical Insights into Overparameterized Models in Multi-Task and Replay-Based Continual Learning