Compression and Inference of Spiking Neural Networks on Resource-Constrained Hardware

📅 2025-11-15

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

To address the challenges of deploying spiking neural networks (SNNs) on resource-constrained edge devices and unlocking their energy-efficiency potential, this work proposes a lightweight, event-driven runtime framework implemented in C. Methodologically, it integrates spike-sparsity-aware joint pruning of neurons and synapses, static memory pre-allocation, compact binary data representation, and cache-aware optimization, while enabling end-to-end export from SNNTorch models. To our knowledge, this is the first efficient SNN inference implementation on microcontrollers such as the Arduino Portenta H7, achieving accuracy comparable to Python-based baselines on N-MNIST and ST-MNIST. It delivers ~10× speedup over desktop CPU implementations, with significantly reduced memory footprint, inference latency, and energy consumption. The core contributions are: (i) an embedded-oriented SNN sparsification and compression paradigm, and (ii) a zero-dynamic-memory-allocation lightweight runtime design.

Technology Category

Application Category

📝 Abstract

Spiking neural networks (SNNs) communicate via discrete spikes in time rather than continuous activations. Their event-driven nature offers advantages for temporal processing and energy efficiency on resource-constrained hardware, but training and deployment remain challenging. We present a lightweight C-based runtime for SNN inference on edge devices and optimizations that reduce latency and memory without sacrificing accuracy. Trained models exported from SNNTorch are translated to a compact C representation; static, cache-friendly data layouts and preallocation avoid interpreter and allocation overheads. We further exploit sparse spiking activity to prune inactive neurons and synapses, shrinking computation in upstream convolutional layers. Experiments on N-MNIST and ST-MNIST show functional parity with the Python baseline while achieving ~10 speedups on desktop CPU and additional gains with pruning, together with large memory reductions that enable microcontroller deployment (Arduino Portenta H7). Results indicate that SNNs can be executed efficiently on conventional embedded platforms when paired with an optimized runtime and spike-driven model compression. Code: https://github.com/karol-jurzec/snn-generator/

Problem

Research questions and friction points this paper is trying to address.

Optimizing SNN inference for resource-constrained edge devices

Reducing latency and memory usage without sacrificing accuracy

Enabling efficient SNN deployment on conventional embedded platforms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight C runtime for SNN edge inference

Static cache-friendly layouts eliminate allocation overhead

Sparsity pruning shrinks computation in convolutional layers

🔎 Similar Papers

Application based Evaluation of an Efficient Spike-Encoder, "Spiketrum"