STI-SNN: A 0.14 GOPS/W/PE Single-Timestep Inference FPGA-based SNN Accelerator with Algorithm and Hardware Co-Design

📅 2025-06-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of poor parallelism, low data reuse, high latency, and suboptimal energy efficiency in hardware acceleration of Spiking Neural Networks (SNNs)—stemming from irregular spike timing—this work presents the first FPGA accelerator for single-timestep inference under resource-constrained conditions. Our approach employs algorithm-hardware co-design: (1) we propose a novel single-timestep SNN inference architecture; (2) introduce a timing-aware pruning algorithm built upon Temporal Efficient Training (TET); (3) design an Output-Stationary (OS) dataflow with a compressed, sorted spike buffer; and (4) support depthwise separable convolutions alongside inter- and intra-layer parallelism. Experimental results demonstrate an energy efficiency of 0.14 GOPS/W/PE, significantly reducing membrane potential storage overhead and memory accesses. Inference speed and energy efficiency are substantially improved, while maintaining lossless accuracy.

Technology Category

Application Category

📝 Abstract
Brain-inspired Spiking Neural Networks (SNNs) have attracted attention for their event-driven characteristics and high energy efficiency. However, the temporal dependency and irregularity of spikes present significant challenges for hardware parallel processing and data reuse, leading to some existing accelerators falling short in processing latency and energy efficiency. To overcome these challenges, we introduce the STI-SNN accelerator, designed for resource-constrained applications with high energy efficiency, flexibility, and low latency. The accelerator is designed through algorithm and hardware co-design. Firstly, STI-SNN can perform inference in a single timestep. At the algorithm level, we introduce a temporal pruning approach based on the temporal efficient training (TET) loss function. This approach alleviates spike disappearance during timestep reduction, maintains inference accuracy, and expands TET's application. In hardware design, we analyze data access patterns and adopt the output stationary (OS) dataflow, eliminating the need to store membrane potentials and access memory operations. Furthermore, based on the OS dataflow, we propose a compressed and sorted representation of spikes, then cached in the line buffer to reduce the memory access cost and improve reuse efficiency. Secondly, STI-SNN supports different convolution methods. By adjusting the computation mode of processing elements (PEs) and parameterizing the computation array, STI-SNN can accommodate lightweight models based on depthwise separable convolutions (DSCs), further enhancing hardware flexibility. Lastly, STI-SNN also supports both inter-layer and intra-layer parallel processing. For inter-layer parallelism, we ...
Problem

Research questions and friction points this paper is trying to address.

Overcoming temporal dependency and spike irregularity in SNNs
Enhancing energy efficiency and reducing processing latency
Supporting flexible convolution methods and parallel processing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Single-timestep inference with temporal pruning
Output stationary dataflow for memory efficiency
Flexible convolution methods and parallel processing
🔎 Similar Papers
2021-07-01IEEE International Conference on Application-Specific Systems, Architectures, and ProcessorsCitations: 13
Kainan Wang
Kainan Wang
Zhejiang University-University of Illinois Urbana-Champaign Institute, Zhejiang University, Haining, China
C
Chengyi Yang
Zhejiang University-University of Illinois Urbana-Champaign Institute, Zhejiang University, Haining, China
Chengting Yu
Chengting Yu
Zhejiang University
Yee Sin Ang
Yee Sin Ang
Information Systems Technology and Design, Singapore University of Technology and Design, Singapore
B
Bo Wang
Information Systems Technology and Design, Singapore University of Technology and Design, Singapore
A
Aili Wang
Zhejiang University-University of Illinois Urbana-Champaign Institute, Zhejiang University, Haining, China