STAA-SNN: Spatial-Temporal Attention Aggregator for Spiking Neural Networks

📅 2025-03-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
A core limitation of spiking neural networks (SNNs) is their significantly weaker spatiotemporal modeling capability compared to artificial neural networks (ANNs), hindering practical deployment. To address this, we propose STAA-SNN—a novel SNN framework featuring the first self-attention mechanism explicitly designed for spike sequences. It incorporates implicit positional encoding to model temporal dependencies, step-level attention aggregation for adaptive feature weighting across timesteps, and timestep-wise stochastic dropout for regularization. Furthermore, it employs spike-driven backpropagation for efficient training. Evaluated on the neuromorphic dataset CIFAR10-DVS, STAA-SNN achieves state-of-the-art (SOTA) performance. On static image benchmarks—CIFAR-10, CIFAR-100, and ImageNet—it attains top accuracies of 97.14%, 82.05%, and 70.40%, respectively, surpassing prior SNNs by 0.33–2.80% while requiring fewer timesteps. These results substantially narrow the accuracy gap between SNNs and ANNs.

Technology Category

Application Category

📝 Abstract
Spiking Neural Networks (SNNs) have gained significant attention due to their biological plausibility and energy efficiency, making them promising alternatives to Artificial Neural Networks (ANNs). However, the performance gap between SNNs and ANNs remains a substantial challenge hindering the widespread adoption of SNNs. In this paper, we propose a Spatial-Temporal Attention Aggregator SNN (STAA-SNN) framework, which dynamically focuses on and captures both spatial and temporal dependencies. First, we introduce a spike-driven self-attention mechanism specifically designed for SNNs. Additionally, we pioneeringly incorporate position encoding to integrate latent temporal relationships into the incoming features. For spatial-temporal information aggregation, we employ step attention to selectively amplify relevant features at different steps. Finally, we implement a time-step random dropout strategy to avoid local optima. As a result, STAA-SNN effectively captures both spatial and temporal dependencies, enabling the model to analyze complex patterns and make accurate predictions. The framework demonstrates exceptional performance across diverse datasets and exhibits strong generalization capabilities. Notably, STAA-SNN achieves state-of-the-art results on neuromorphic datasets CIFAR10-DVS, with remarkable performances of 97.14%, 82.05% and 70.40% on the static datasets CIFAR-10, CIFAR-100 and ImageNet, respectively. Furthermore, our model exhibits improved performance ranging from 0.33% to 2.80% with fewer time steps. The code for the model is available on GitHub.
Problem

Research questions and friction points this paper is trying to address.

Bridges performance gap between SNNs and ANNs
Captures spatial-temporal dependencies in SNNs
Improves accuracy and generalization in neuromorphic datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spike-driven self-attention mechanism for SNNs
Position encoding for temporal relationships
Step attention for spatial-temporal aggregation
🔎 Similar Papers
T
Tianqing Zhang
Zhejiang University
Kairong Yu
Kairong Yu
Zhejiang University
Computer VisionMultimodal LearningSpiking Neural Network
X
Xian Zhong
Wuhan University of Technology
H
Hongwei Wang
Zhejiang University
Q
Qi Xu
Dalian University of Technology
Q
Qiang Zhang
Dalian University of Technology