STI-SNN: A 0.14 GOPS/W/PE Single-Timestep Inference FPGA-based SNN Accelerator with Algorithm and Hardware Co-Design

📅 2025-06-10

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the challenges of poor parallelism, low data reuse, high latency, and suboptimal energy efficiency in hardware acceleration of Spiking Neural Networks (SNNs)—stemming from irregular spike timing—this work presents the first FPGA accelerator for single-timestep inference under resource-constrained conditions. Our approach employs algorithm-hardware co-design: (1) we propose a novel single-timestep SNN inference architecture; (2) introduce a timing-aware pruning algorithm built upon Temporal Efficient Training (TET); (3) design an Output-Stationary (OS) dataflow with a compressed, sorted spike buffer; and (4) support depthwise separable convolutions alongside inter- and intra-layer parallelism. Experimental results demonstrate an energy efficiency of 0.14 GOPS/W/PE, significantly reducing membrane potential storage overhead and memory accesses. Inference speed and energy efficiency are substantially improved, while maintaining lossless accuracy.

Technology Category

Application Category

📝 Abstract

Brain-inspired Spiking Neural Networks (SNNs) have attracted attention for their event-driven characteristics and high energy efficiency. However, the temporal dependency and irregularity of spikes present significant challenges for hardware parallel processing and data reuse, leading to some existing accelerators falling short in processing latency and energy efficiency. To overcome these challenges, we introduce the STI-SNN accelerator, designed for resource-constrained applications with high energy efficiency, flexibility, and low latency. The accelerator is designed through algorithm and hardware co-design. Firstly, STI-SNN can perform inference in a single timestep. At the algorithm level, we introduce a temporal pruning approach based on the temporal efficient training (TET) loss function. This approach alleviates spike disappearance during timestep reduction, maintains inference accuracy, and expands TET's application. In hardware design, we analyze data access patterns and adopt the output stationary (OS) dataflow, eliminating the need to store membrane potentials and access memory operations. Furthermore, based on the OS dataflow, we propose a compressed and sorted representation of spikes, then cached in the line buffer to reduce the memory access cost and improve reuse efficiency. Secondly, STI-SNN supports different convolution methods. By adjusting the computation mode of processing elements (PEs) and parameterizing the computation array, STI-SNN can accommodate lightweight models based on depthwise separable convolutions (DSCs), further enhancing hardware flexibility. Lastly, STI-SNN also supports both inter-layer and intra-layer parallel processing. For inter-layer parallelism, we ...

Problem

Research questions and friction points this paper is trying to address.

Overcoming temporal dependency and spike irregularity in SNNs

Enhancing energy efficiency and reducing processing latency

Supporting flexible convolution methods and parallel processing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Single-timestep inference with temporal pruning

Output stationary dataflow for memory efficiency

Flexible convolution methods and parallel processing

🔎 Similar Papers

An Efficient Real-Time Object Detection Framework on Resource-Constricted Hardware Devices via Software and Hardware Co-design

2021-07-01IEEE International Conference on Application-Specific Systems, Architectures, and ProcessorsCitations: 13

Authors to Follow