🤖 AI Summary
Unbounded weight growth during neural network training leads to plasticity loss, degrading generalization and optimization stability. To address this, we propose Soft Weight Rescaling (SWR), a lightweight technique that dynamically rescales layer-wise weights at each gradient descent step—constraining their magnitudes and equalizing inter-layer distributions—without requiring weight reinitialization. Theoretically, SWR mitigates gradient degradation and preserves the efficacy of parameter updates. Empirically, it consistently improves performance across warm-start image classification, continual learning, and single-task settings; in continual learning, it boosts average accuracy by up to 3.2% while retaining previously acquired knowledge throughout training. This work is the first to rigorously demonstrate—both theoretically and empirically—that minimal, adaptive weight rescaling suffices to sustain plasticity, obviating the need for costly reinitialization strategies.
📝 Abstract
Recent studies have shown that as training progresses, neural networks gradually lose their capacity to learn new information, a phenomenon known as plasticity loss. An unbounded weight growth is one of the main causes of plasticity loss. Furthermore, it harms generalization capability and disrupts optimization dynamics. Re-initializing the network can be a solution, but it results in the loss of learned information, leading to performance drops. In this paper, we propose Soft Weight Rescaling (SWR), a novel approach that prevents unbounded weight growth without losing information. SWR recovers the plasticity of the network by simply scaling down the weight at each step of the learning process. We theoretically prove that SWR bounds weight magnitude and balances weight magnitude between layers. Our experiment shows that SWR improves performance on warm-start learning, continual learning, and single-task learning setups on standard image classification benchmarks.