🤖 AI Summary
Neural networks suffer from plasticity loss when continually learning non-stationary data, leading to degraded learning capacity. To address this, we systematically compare two plasticity-maintenance strategies: neuron re-initialization versus selective weight re-initialization. We propose a lightweight, plug-and-play selective weight re-initialization algorithm, integrated with continual backpropagation and the ReDo framework, and evaluate it across multiple continual supervised learning benchmarks. Results demonstrate that our method significantly outperforms neuron re-initialization—particularly in small-scale networks, architectures with layer normalization, and resource-constrained settings—exhibiting superior robustness. Crucially, its effectiveness validates that localized weight perturbation is more efficient than structural-level resetting for restoring plasticity. This work provides both a novel conceptual insight—emphasizing fine-grained parameter modulation over coarse-grained architectural reset—and a practical, deployable tool for sustaining plasticity in continual learning systems.
📝 Abstract
Loss of plasticity is a phenomenon in which a neural network loses its ability to learn when trained for an extended time on non-stationary data. It is a crucial problem to overcome when designing systems that learn continually. An effective technique for preventing loss of plasticity is reinitializing parts of the network. In this paper, we compare two different reinitialization schemes: reinitializing units vs reinitializing weights. We propose a new algorithm, which we name extit{selective weight reinitialization}, for reinitializing the least useful weights in a network. We compare our algorithm to continual backpropagation and ReDo, two previously proposed algorithms that reinitialize units in the network. Through our experiments in continual supervised learning problems, we identify two settings when reinitializing weights is more effective at maintaining plasticity than reinitializing units: (1) when the network has a small number of units and (2) when the network includes layer normalization. Conversely, reinitializing weights and units are equally effective at maintaining plasticity when the network is of sufficient size and does not include layer normalization. We found that reinitializing weights maintains plasticity in a wider variety of settings than reinitializing units.