🤖 AI Summary
To address the challenges of deploying spiking neural networks (SNNs) on resource-constrained edge devices and unlocking their energy-efficiency potential, this work proposes a lightweight, event-driven runtime framework implemented in C. Methodologically, it integrates spike-sparsity-aware joint pruning of neurons and synapses, static memory pre-allocation, compact binary data representation, and cache-aware optimization, while enabling end-to-end export from SNNTorch models. To our knowledge, this is the first efficient SNN inference implementation on microcontrollers such as the Arduino Portenta H7, achieving accuracy comparable to Python-based baselines on N-MNIST and ST-MNIST. It delivers ~10× speedup over desktop CPU implementations, with significantly reduced memory footprint, inference latency, and energy consumption. The core contributions are: (i) an embedded-oriented SNN sparsification and compression paradigm, and (ii) a zero-dynamic-memory-allocation lightweight runtime design.
📝 Abstract
Spiking neural networks (SNNs) communicate via discrete spikes in time rather than continuous activations. Their event-driven nature offers advantages for temporal processing and energy efficiency on resource-constrained hardware, but training and deployment remain challenging. We present a lightweight C-based runtime for SNN inference on edge devices and optimizations that reduce latency and memory without sacrificing accuracy. Trained models exported from SNNTorch are translated to a compact C representation; static, cache-friendly data layouts and preallocation avoid interpreter and allocation overheads. We further exploit sparse spiking activity to prune inactive neurons and synapses, shrinking computation in upstream convolutional layers. Experiments on N-MNIST and ST-MNIST show functional parity with the Python baseline while achieving ~10 speedups on desktop CPU and additional gains with pruning, together with large memory reductions that enable microcontroller deployment (Arduino Portenta H7). Results indicate that SNNs can be executed efficiently on conventional embedded platforms when paired with an optimized runtime and spike-driven model compression. Code: https://github.com/karol-jurzec/snn-generator/