🤖 AI Summary
To address catastrophic forgetting and the memory intractability caused by replay mechanisms in incremental generative modeling, this paper proposes a replay-free, parameter-controllable VAE framework. Methodologically, it designs a multimodal latent space architecture and introduces orthogonality constraints on latent vectors to suppress cross-task interference; it supports both static and dynamic VAE variants, with model parameters remaining nearly constant as task count increases. Key contributions include: (1) the first replay-free paradigm for incremental generative modeling; (2) a forgetting-mitigation criterion grounded in latent-space orthogonality; and (3) over an order-of-magnitude reduction in memory overhead. The framework achieves state-of-the-art performance on both generative and classification benchmarks, significantly enhancing the scalability and practicality of incremental generative models.
📝 Abstract
Continual or incremental learning holds tremendous potential in deep learning with different challenges including catastrophic forgetting. The advent of powerful foundation and generative models has propelled this paradigm even further, making it one of the most viable solution to train these models. However, one of the persisting issues lies in the increasing volume of data particularly with replay-based methods. This growth introduces challenges with scalability since continuously expanding data becomes increasingly demanding as the number of tasks grows. In this paper, we attenuate this issue by devising a novel replay-free incremental learning model based on Variational Autoencoders (VAEs). The main contribution of this work includes (i) a novel incremental generative modelling, built upon a well designed multi-modal latent space, and also (ii) an orthogonality criterion that mitigates catastrophic forgetting of the learned VAEs. The proposed method considers two variants of these VAEs: static and dynamic with no (or at most a controlled) growth in the number of parameters. Extensive experiments show that our method is (at least) an order of magnitude more ``memory-frugal'' compared to the closely related works while achieving SOTA accuracy scores.