A Principled Bayesian Framework for Training Binary and Spiking Neural Networks

📅 2025-05-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of end-to-end training and limited performance of Binary Neural Networks (BNNs) and Spiking Neural Networks (SNNs) in the absence of normalization layers. We propose a unified Bayesian modeling framework: (1) an Importance-Weighted Straight-Through (IW-ST) estimator family, with theoretical analysis of its bias–variance trade-off; (2) the first variational inference framework for Spiking BNNs, leveraging posterior noise for implicit regularization and low-bias gradient optimization; and (3) elimination of handcrafted gradient approximations and dependence on normalization layers. Our method achieves state-of-the-art or competitive performance on CIFAR-10, DVS Gesture, and SHD benchmarks—demonstrating, for the first time, stable end-to-end training of deep residual BNNs and SNNs without any normalization layers.

Technology Category

Application Category

📝 Abstract
We propose a Bayesian framework for training binary and spiking neural networks that achieves state-of-the-art performance without normalisation layers. Unlike commonly used surrogate gradient methods -- often heuristic and sensitive to hyperparameter choices -- our approach is grounded in a probabilistic model of noisy binary networks, enabling fully end-to-end gradient-based optimisation. We introduce importance-weighted straight-through (IW-ST) estimators, a unified class generalising straight-through and relaxation-based estimators. We characterise the bias-variance trade-off in this family and derive a bias-minimising objective implemented via an auxiliary loss. Building on this, we introduce Spiking Bayesian Neural Networks (SBNNs), a variational inference framework that uses posterior noise to train Binary and Spiking Neural Networks with IW-ST. This Bayesian approach minimises gradient bias, regularises parameters, and introduces dropout-like noise. By linking low-bias conditions, vanishing gradients, and the KL term, we enable training of deep residual networks without normalisation. Experiments on CIFAR-10, DVS Gesture, and SHD show our method matches or exceeds existing approaches without normalisation or hand-tuned gradients.
Problem

Research questions and friction points this paper is trying to address.

Develops Bayesian framework for binary/spiking neural networks
Introduces low-bias IW-ST estimators for gradient optimization
Enables deep network training without normalization layers
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian framework for binary and spiking networks
Importance-weighted straight-through estimators
Spiking Bayesian Neural Networks with variational inference
🔎 Similar Papers
No similar papers found.