๐ค AI Summary
In continual learning, random initialization of novel-class classifiers induces sharp initial loss spikes and training instability, leading to slow convergence and high computational overhead. To address this, we propose a data-driven weight initialization method for classifier weights in the feature space. Specifically, we introduce, for the first time in continual learning, the closed-form least-squares classifier solution derived from neural collapse theoryโenabling task-adaptive, training-free weight initialization. Our approach requires no additional parameters or fine-tuning; it solely leverages the feature distribution of new-class samples extracted by a frozen backbone network. Experiments demonstrate that our method significantly suppresses initial loss spikes, accelerates adaptation to new tasks, improves final accuracy on mainstream large-scale continual learning benchmarks (e.g., CIFAR-100, ImageNet-1K), and reduces convergence-related computational cost by over 30%.
๐ Abstract
To adapt to real-world data streams, continual learning (CL) systems must rapidly learn new concepts while preserving and utilizing prior knowledge. When it comes to adding new information to continually-trained deep neural networks (DNNs), classifier weights for newly encountered categories are typically initialized randomly, leading to high initial training loss (spikes) and instability. Consequently, achieving optimal convergence and accuracy requires prolonged training, increasing computational costs. Inspired by Neural Collapse (NC), we propose a weight initialization strategy to improve learning efficiency in CL. In DNNs trained with mean-squared-error, NC gives rise to a Least-Square (LS) classifier in the last layer, whose weights can be analytically derived from learned features. We leverage this LS formulation to initialize classifier weights in a data-driven manner, aligning them with the feature distribution rather than using random initialization. Our method mitigates initial loss spikes and accelerates adaptation to new tasks. We evaluate our approach in large-scale CL settings, demonstrating faster adaptation and improved CL performance.