🤖 AI Summary
Existing ANN-to-SNN conversion methods rely on long temporal sequences (T ≫ 1), incurring high latency and computational overhead. To address this, we propose an efficient single-timestep (T = 1) conversion framework grounded in time-space equivalence theory. Our approach introduces Scale-and-Fire neurons and a Spiking Transformer architecture, augmented with an adaptive scaling-and-firing mechanism, multi-threshold neurons, and spike-pattern optimization via attention distribution alignment. Evaluated on ImageNet-1K, our method achieves 88.8% top-1 accuracy—substantially surpassing prior conversion techniques—while also attaining state-of-the-art performance on object detection and instance segmentation benchmarks. This work represents the first successful realization of high-accuracy, general-purpose SNN conversion at T = 1, effectively breaking the conventional temporal-depth bottleneck and establishing a new paradigm for ultra-low-latency neuromorphic inference.
📝 Abstract
Spiking Neural Networks (SNNs) are gaining attention as energy-efficient alternatives to Artificial Neural Networks (ANNs), especially in resource-constrained settings. While ANN-to-SNN conversion (ANN2SNN) achieves high accuracy without end-to-end SNN training, existing methods rely on large time steps, leading to high inference latency and computational cost. In this paper, we propose a theoretical and practical framework for single-timestep ANN2SNN. We establish the Temporal-to-Spatial Equivalence Theory, proving that multi-timestep integrate-and-fire (IF) neurons can be equivalently replaced by single-timestep multi-threshold neurons (MTN). Based on this theory, we introduce the Scale-and-Fire Neuron (SFN), which enables effective single-timestep ($T=1$) spiking through adaptive scaling and firing. Furthermore, we develop the SFN-based Spiking Transformer (SFormer), a specialized instantiation of SFN within Transformer architectures, where spike patterns are aligned with attention distributions to mitigate the computational, energy, and hardware overhead of the multi-threshold design. Extensive experiments on image classification, object detection, and instance segmentation demonstrate that our method achieves state-of-the-art performance under single-timestep inference. Notably, we achieve 88.8% top-1 accuracy on ImageNet-1K at $T=1$, surpassing existing conversion methods.