🤖 AI Summary
This work proposes the FOPNG optimizer to address catastrophic forgetting in continual learning, where neural networks tend to overwrite knowledge of previously learned tasks when acquiring new ones. By unifying natural gradient and orthogonal gradient concepts within an information-geometric framework, FOPNG projects the current gradient onto the orthogonal complement of prior task gradients under the Fisher information metric, thereby preserving previously acquired knowledge. The resulting update direction is invariant to parameterization and guarantees descent under the Fisher metric. An efficient implementation leveraging a diagonal approximation of the Fisher matrix enables FOPNG to achieve state-of-the-art performance across standard continual learning benchmarks, including Permuted-MNIST, Split-MNIST, Rotated-MNIST, Split-CIFAR10, and Split-CIFAR100.
📝 Abstract
Continual learning aims to enable neural networks to acquire new knowledge on sequential tasks. However, the key challenge in such settings is to learn new tasks without catastrophically forgetting previously learned tasks. We propose the Fisher-Orthogonal Projected Natural Gradient Descent (FOPNG) optimizer, which enforces Fisher-orthogonal constraints on parameter updates to preserve old task performance while learning new tasks. Unlike existing methods that operate in Euclidean parameter space, FOPNG projects gradients onto the Fisher-orthogonal complement of previous task gradients. This approach unifies natural gradient descent with orthogonal gradient methods within an information-geometric framework. We provide theoretical analysis deriving the projected update, describe efficient and practical implementations using the diagonal Fisher, and demonstrate strong results on standard continual learning benchmarks such as Permuted-MNIST, Split-MNIST, Rotated-MNIST, Split-CIFAR10, and Split-CIFAR100. Our code is available at https://github.com/ishirgarg/FOPNG.