Collapse or Preserve: Data-Dependent Temporal Aggregation for Spiking Neural Network Acceleration

πŸ“… 2026-03-14
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of efficiently exploiting the fine-grained, unstructured sparsity inherent in binary spikes of spiking neural networks (SNNs), particularly on SIMD-based GPUs where existing sparse computation methods fall short. The authors propose Temporal Aggregated Convolution (TAC), which pre-aggregates spikes across K time steps to reduce convolution invocations. For event-based data, they further introduce TAC-TP to preserve critical temporal information. Challenging the common assumption that β€œsparsity implies efficiency,” the study advocates a data-dependent temporal aggregation strategy: compressing temporal dimensions for rate-coded inputs to boost both speed and accuracy, while retaining full temporal resolution for event data. Experiments demonstrate that TAC achieves a 13.8Γ— speedup with improved accuracy on MNIST and Fashion-MNIST, while TAC-TP reduces convolution calls by 50% and attains 95.1% accuracy on DVS128-Gesture.

Technology Category

Application Category

πŸ“ Abstract
Spike sparsity is widely believed to enable efficient spiking neural network (SNN) inference on GPU hardware. We demonstrate this is an illusion: five distinct sparse computation strategies on Apple M3 Max all fail to outperform dense convolution, because SIMD architectures cannot exploit the fine-grained, unstructured sparsity of i.i.d. binary spikes. Instead, we propose Temporal Aggregated Convolution (TAC), which exploits convolution linearity to pre-aggregate $K$ spike frames before a single convolution call, reducing $T$ calls to $T/K$. On rate-coded data, TAC achieves 13.8times speedup with +1.6% accuracy on MNIST and +5.4% on Fashion-MNIST -- a simultaneous improvement in both speed and accuracy. However, on event-based data where the temporal dimension carries genuine motion information, TAC's temporal collapse is harmful. We therefore introduce TAC-TP (Temporal Preservation), which shares each group's convolution output across K independent LIF steps, preserving full temporal resolution for downstream layers. On DVS128-Gesture, TAC-TP achieves 95.1% accuracy (vs. 96.3% baseline) with 50% fewer convolution calls, while standard TAC drops to 91.3%. Our key finding is that the optimal temporal aggregation strategy is data-dependent: collapse the temporal dimension for rate-coded data (noise reduction) but preserve it for event data (information retention). Speedup is hardware-agnostic: TAC achieves 11.0times on NVIDIA V100, confirming the mechanism transfers across GPU architectures. All operators in the mlx-snn library are open source.
Problem

Research questions and friction points this paper is trying to address.

Spiking Neural Network
Temporal Aggregation
Rate-coded Data
Event-based Data
Inference Acceleration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Temporal Aggregation
Spiking Neural Networks
Data-Dependent Strategy
Convolution Acceleration
Event-Based Vision
πŸ”Ž Similar Papers
No similar papers found.