Hardware Efficient Accelerator for Spiking Transformer With Reconfigurable Parallel Time Step Computing

📅 2025-03-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address high latency and power consumption in multi-timestep spiking Transformer inference, this work presents the first low-power hardware accelerator. Methodologically, it introduces a novel tick-batching dataflow enabling full-timestep parallelism and a reconfigurable timestep neuron architecture; replaces residual addition with IAND gates at the model level to enable end-to-end all-spike computation; and adopts a memory-free spike processing paradigm—eliminating membrane potential storage—while supporting vectorized 3×3/1×1 convolutions and matrix operations. Implemented in 28 nm CMOS, the accelerator occupies only 198.46K logic gates and 139.25 KB SRAM. Operating at 500 MHz, it achieves 3.456 TSOPS throughput and 38.334 TSOPS/W energy efficiency—marking a significant breakthrough in spiking Transformer hardware efficiency.

Technology Category

Application Category

📝 Abstract
This paper introduces the first low-power hardware accelerator for Spiking Transformers, an emerging alternative to traditional artificial neural networks. By modifying the base Spikformer model to use IAND instead of residual addition, the model exclusively utilizes spike computation. The hardware employs a fully parallel tick-batching dataflow and a time-step reconfigurable neuron architecture, addressing the delay and power challenges of multi-timestep processing in spiking neural networks. This approach processes outputs from all time steps in parallel, reducing computation delay and eliminating membrane memory, thereby lowering energy consumption. The accelerator supports 3x3 and 1x1 convolutions and matrix operations through vectorized processing, meeting model requirements. Implemented in TSMC's 28nm process, it achieves 3.456 TSOPS (tera spike operations per second) with a power efficiency of 38.334 TSOPS/W at 500MHz, using 198.46K logic gates and 139.25KB of SRAM.
Problem

Research questions and friction points this paper is trying to address.

Develop low-power hardware accelerator for Spiking Transformers
Address delay and power challenges in multi-timestep processing
Enable parallel processing of all time steps efficiently
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reconfigurable parallel time step computing
IAND replaces residual addition
Fully parallel tick-batching dataflow
🔎 Similar Papers
No similar papers found.
Bo-Yu Chen
Bo-Yu Chen
National Taiwan University
music information retrievalhuman computer interactiondeep learning
T
Tian-Sheuan Chang
Institute of Electronics, National Yang Ming Chiao Tung University, Hsinchu, Taiwan