Compression and Inference of Spiking Neural Networks on Resource-Constrained Hardware

📅 2025-11-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of deploying spiking neural networks (SNNs) on resource-constrained edge devices and unlocking their energy-efficiency potential, this work proposes a lightweight, event-driven runtime framework implemented in C. Methodologically, it integrates spike-sparsity-aware joint pruning of neurons and synapses, static memory pre-allocation, compact binary data representation, and cache-aware optimization, while enabling end-to-end export from SNNTorch models. To our knowledge, this is the first efficient SNN inference implementation on microcontrollers such as the Arduino Portenta H7, achieving accuracy comparable to Python-based baselines on N-MNIST and ST-MNIST. It delivers ~10× speedup over desktop CPU implementations, with significantly reduced memory footprint, inference latency, and energy consumption. The core contributions are: (i) an embedded-oriented SNN sparsification and compression paradigm, and (ii) a zero-dynamic-memory-allocation lightweight runtime design.

Technology Category

Application Category

📝 Abstract
Spiking neural networks (SNNs) communicate via discrete spikes in time rather than continuous activations. Their event-driven nature offers advantages for temporal processing and energy efficiency on resource-constrained hardware, but training and deployment remain challenging. We present a lightweight C-based runtime for SNN inference on edge devices and optimizations that reduce latency and memory without sacrificing accuracy. Trained models exported from SNNTorch are translated to a compact C representation; static, cache-friendly data layouts and preallocation avoid interpreter and allocation overheads. We further exploit sparse spiking activity to prune inactive neurons and synapses, shrinking computation in upstream convolutional layers. Experiments on N-MNIST and ST-MNIST show functional parity with the Python baseline while achieving ~10 speedups on desktop CPU and additional gains with pruning, together with large memory reductions that enable microcontroller deployment (Arduino Portenta H7). Results indicate that SNNs can be executed efficiently on conventional embedded platforms when paired with an optimized runtime and spike-driven model compression. Code: https://github.com/karol-jurzec/snn-generator/
Problem

Research questions and friction points this paper is trying to address.

Optimizing SNN inference for resource-constrained edge devices
Reducing latency and memory usage without sacrificing accuracy
Enabling efficient SNN deployment on conventional embedded platforms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight C runtime for SNN edge inference
Static cache-friendly layouts eliminate allocation overhead
Sparsity pruning shrinks computation in convolutional layers
🔎 Similar Papers
No similar papers found.
K
Karol C. Jurzec
AGH University of Science and Technology, Institute of Computer Science, Kraków, Poland
T
Tomasz Szydlo
AGH University of Science and Technology, Institute of Computer Science, Kraków, Poland; School of Computing, Newcastle University, Newcastle upon Tyne, UK
Maciej Wielgosz
Maciej Wielgosz
AGH University of Science and Technology
Cognitive ComputingMachine LearningDeep LearningHardware Acceleration