🤖 AI Summary
To address the significant accuracy gap between spiking neural networks (SNNs) and artificial neural networks (ANNs) in object detection, this paper proposes SpikeYOLO—a novel SNN architecture designed for efficient object detection. Methodologically, it introduces three key innovations: (i) a first-of-its-kind integer-valued training scheme coupled with spike-driven inference neurons and an extended virtual temporal paradigm to mitigate spike degradation during YOLO-to-SNN conversion; (ii) a lightweight SpikeYOLO backbone and feature fusion module; and (iii) ANN-to-SNN conversion optimization alongside joint training on static and neuromorphic event-based data. Evaluated on COCO, SpikeYOLO achieves 66.2% mAP@50—surpassing the prior state-of-the-art by 15.0%. On the Gen1 event camera dataset, it attains 67.2% mAP@50, outperforming its ANN counterpart by 2.5% while delivering a 5.7× improvement in energy efficiency.
📝 Abstract
Brain-inspired Spiking Neural Networks (SNNs) have bio-plausibility and low-power advantages over Artificial Neural Networks (ANNs). Applications of SNNs are currently limited to simple classification tasks because of their poor performance. In this work, we focus on bridging the performance gap between ANNs and SNNs on object detection. Our design revolves around network architecture and spiking neuron. First, the overly complex module design causes spike degradation when the YOLO series is converted to the corresponding spiking version. We design a SpikeYOLO architecture to solve this problem by simplifying the vanilla YOLO and incorporating meta SNN blocks. Second, object detection is more sensitive to quantization errors in the conversion of membrane potentials into binary spikes by spiking neurons. To address this challenge, we design a new spiking neuron that activates Integer values during training while maintaining spike-driven by extending virtual timesteps during inference. The proposed method is validated on both static and neuromorphic object detection datasets. On the static COCO dataset, we obtain 66.2% mAP@50 and 48.9% mAP@50:95, which is +15.0% and +18.7% higher than the prior state-of-the-art SNN, respectively. On the neuromorphic Gen1 dataset, we achieve 67.2% mAP@50, which is +2.5% greater than the ANN with equivalent architecture, and the energy efficiency is improved by 5.7*. Code: https://github.com/BICLab/SpikeYOLO