🤖 AI Summary
Non-differentiable operations in ultra-low-bit quantization (e.g., 1-bit) and highly sparse pruning (>99% sparsity) severely destabilize backpropagation, causing conventional methods like the Straight-Through Estimator (STE) to fail catastrophically during training.
Method: We propose a perturbation modeling paradigm that unifies quantization and pruning as structured perturbations injected during training, and introduce a differentiable denoising affine transformation for gradient approximation. Further, we design a robust ridge regression–based training framework featuring a piecewise-constant backbone network to guarantee performance lower bounds, coupled with an adaptive noise suppression mechanism enabling end-to-end optimization across arbitrary bit-widths and sparsity levels.
Contribution/Results: Our approach achieves, for the first time, stable convergence under simultaneous 1-bit quantization and >99% sparsity—demonstrating unprecedented training stability. It significantly narrows the gap between artificial neural networks and biological neurons in terms of dynamic binary learning mechanisms.
📝 Abstract
The discontinuous operations inherent in quantization and sparsification introduce obstacles to backpropagation. This is particularly challenging when training deep neural networks in ultra-low precision and sparse regimes. We propose a novel, robust, and universal solution: a denoising affine transform that stabilizes training under these challenging conditions. By formulating quantization and sparsification as perturbations during training, we derive a perturbation-resilient approach based on ridge regression. Our solution employs a piecewise constant backbone model to ensure a performance lower bound and features an inherent noise reduction mechanism to mitigate perturbation-induced corruption. This formulation allows existing models to be trained at arbitrarily low precision and sparsity levels with off-the-shelf recipes. Furthermore, our method provides a novel perspective on training temporal binary neural networks, contributing to ongoing efforts to narrow the gap between artificial and biological neural networks.