Slowing Down Forgetting in Continual Learning

📅 2024-11-11
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses catastrophic forgetting in continual learning by proposing ReCL, a self-sustaining framework that requires no external memory. The core method leverages the continual learning model itself as an implicit “memory buffer”: it exploits the model’s inherent implicit bias toward maximum-margin solutions under gradient descent to reverse-engineer synthetic samples from past tasks, which are then jointly trained with incoming task data. ReCL introduces no additional parameters and stores no historical data. Its design is grounded in rigorous theoretical analysis—linking implicit regularization, gradient descent dynamics, and maximum-margin theory—while demonstrating strong generalization across diverse continual learning paradigms (task-, domain-, and class-incremental) and benchmarks (MNIST, CIFAR-10). Empirically, ReCL significantly improves backward transfer—i.e., retention of old-task accuracy—and consistently outperforms state-of-the-art methods without memory replay or parameter expansion.

Technology Category

Application Category

📝 Abstract
A common challenge in continual learning (CL) is catastrophic forgetting, where the performance on old tasks drops after new, additional tasks are learned. In this paper, we propose a novel framework called ReCL to slow down forgetting in CL. Our framework exploits an implicit bias of gradient-based neural networks due to which these converge to margin maximization points. Such convergence points allow us to reconstruct old data from previous tasks, which we then combine with the current training data. Our framework is flexible and can be applied on top of existing, state-of-the-art CL methods to slow down forgetting. We further demonstrate the performance gain from our framework across a large series of experiments, including different CL scenarios (class incremental, domain incremental, task incremental learning) different datasets (MNIST, CIFAR10), and different network architectures. Across all experiments, we find large performance gains through ReCL. To the best of our knowledge, our framework is the first to address catastrophic forgetting by leveraging models in CL as their own memory buffers.
Problem

Research questions and friction points this paper is trying to address.

Addresses catastrophic forgetting in continual learning.
Proposes ReCL framework to reconstruct old task data.
Improves performance across various datasets and architectures.
Innovation

Methods, ideas, or system contributions that make the work stand out.

ReCL framework slows catastrophic forgetting in CL.
Reconstructs old data using gradient-based neural networks.
Enhances performance across diverse CL scenarios.