Robust Training of Neural Networks at Arbitrary Precision and Sparsity

📅 2024-09-14
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
Non-differentiable operations in ultra-low-bit quantization (e.g., 1-bit) and highly sparse pruning (>99% sparsity) severely destabilize backpropagation, causing conventional methods like the Straight-Through Estimator (STE) to fail catastrophically during training. Method: We propose a perturbation modeling paradigm that unifies quantization and pruning as structured perturbations injected during training, and introduce a differentiable denoising affine transformation for gradient approximation. Further, we design a robust ridge regression–based training framework featuring a piecewise-constant backbone network to guarantee performance lower bounds, coupled with an adaptive noise suppression mechanism enabling end-to-end optimization across arbitrary bit-widths and sparsity levels. Contribution/Results: Our approach achieves, for the first time, stable convergence under simultaneous 1-bit quantization and >99% sparsity—demonstrating unprecedented training stability. It significantly narrows the gap between artificial neural networks and biological neurons in terms of dynamic binary learning mechanisms.

Technology Category

Application Category

📝 Abstract
The discontinuous operations inherent in quantization and sparsification introduce obstacles to backpropagation. This is particularly challenging when training deep neural networks in ultra-low precision and sparse regimes. We propose a novel, robust, and universal solution: a denoising affine transform that stabilizes training under these challenging conditions. By formulating quantization and sparsification as perturbations during training, we derive a perturbation-resilient approach based on ridge regression. Our solution employs a piecewise constant backbone model to ensure a performance lower bound and features an inherent noise reduction mechanism to mitigate perturbation-induced corruption. This formulation allows existing models to be trained at arbitrarily low precision and sparsity levels with off-the-shelf recipes. Furthermore, our method provides a novel perspective on training temporal binary neural networks, contributing to ongoing efforts to narrow the gap between artificial and biological neural networks.
Problem

Research questions and friction points this paper is trying to address.

Overcoming backpropagation obstacles in ultra-low precision neural networks
Addressing gradient mismatch in quantization-aware training with STE
Enabling stable training of binary and sparse sub-1-bit networks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Denoising dequantization transform for robust training
Unified framework for quantization and sparsification
Corrective gradient path for quantization error
🔎 Similar Papers
No similar papers found.
Chengxi Ye
Chengxi Ye
Google DeepMind
Deep LearningComputer VisionBioinformatics
G
Grace Chu
Google∗DeepMind
Yanfeng Liu
Yanfeng Liu
Google∗DeepMind
Y
Yichi Zhang
Google∗DeepMind
L
Lukasz Lew
Google∗DeepMind
A
Andrew Howard
Google∗DeepMind