🤖 AI Summary
This work addresses catastrophic forgetting in continual learning, induced by gradient-based training of neural networks. We analyze a single-hidden-layer quadratic network trained on an orthogonal-mean XOR clustering dataset corrupted by Gaussian noise, deriving theoretical upper bounds on forgetting rates during both training and testing phases, and validating them empirically. Our contribution is the first systematic characterization—both theoretically and experimentally—of how task count, sample size, optimization iterations, and hidden-layer width quantitatively govern forgetting, yielding interpretable, tight theoretical bounds. Experiments reveal threshold effects: exceeding critical values in key parameters (e.g., number of tasks or hidden-layer width) sharply accelerates forgetting. The derived bounds closely match empirical forgetting rates across diverse configurations, demonstrating broad applicability. These results provide theoretically grounded, empirically verifiable guidance for designing robust continual learning algorithms and principled hyperparameter tuning strategies.
📝 Abstract
Continual learning, the ability of a model to adapt to an ongoing sequence of tasks without forgetting the earlier ones, is a central goal of artificial intelligence. To shed light on its underlying mechanisms, we analyze the limitations of continual learning in a tractable yet representative setting. In particular, we study one-hidden-layer quadratic neural networks trained by gradient descent on an XOR cluster dataset with Gaussian noise, where different tasks correspond to different clusters with orthogonal means. Our results obtain bounds on the rate of forgetting during train and test-time in terms of the number of iterations, the sample size, the number of tasks, and the hidden-layer size. Our results reveal interesting phenomena on the role of different problem parameters in the rate of forgetting. Numerical experiments across diverse setups confirm our results, demonstrating their validity beyond the analyzed settings.