🤖 AI Summary
Existing spiking neural networks (SNNs) for object detection suffer from limited temporal representation capability due to monotonous input encoding schemes—such as frame averaging or repetitive copying—that yield static, non-dynamic neuronal stimulation. To address this, we propose the Temporal Dynamic Enhancer (TDE), comprising a Spiking Encoder and an event-driven Spike-Driven Attention Gating module, which jointly model long-range temporal dependencies while suppressing redundant neuronal activations. TDE enables end-to-end training, achieving both high accuracy and low energy consumption. Experiments demonstrate state-of-the-art performance: 57.7% mAP₅₀₋₉₅ on PASCAL VOC and 47.6% mAP on EvDET200K, with energy consumption reduced to only 24.0% of that required by conventional attention mechanisms. This work significantly advances the timeliness and energy efficiency of SNNs in complex vision tasks.
📝 Abstract
Spiking Neural Networks (SNNs), with their brain-inspired spatiotemporal dynamics and spike-driven computation, have emerged as promising energy-efficient alternatives to Artificial Neural Networks (ANNs). However, existing SNNs typically replicate inputs directly or aggregate them into frames at fixed intervals. Such strategies lead to neurons receiving nearly identical stimuli across time steps, severely limiting the model's expressive power, particularly in complex tasks like object detection. In this work, we propose the Temporal Dynamics Enhancer (TDE) to strengthen SNNs'capacity for temporal information modeling. TDE consists of two modules: a Spiking Encoder (SE) that generates diverse input stimuli across time steps, and an Attention Gating Module (AGM) that guides the SE generation based on inter-temporal dependencies. Moreover, to eliminate the high-energy multiplication operations introduced by the AGM, we propose a Spike-Driven Attention (SDA) to reduce attention-related energy consumption. Extensive experiments demonstrate that TDE can be seamlessly integrated into existing SNN-based detectors and consistently outperforms state-of-the-art methods, achieving mAP50-95 scores of 57.7% on the static PASCAL VOC dataset and 47.6% on the neuromorphic EvDET200K dataset. In terms of energy consumption, the SDA consumes only 0.240 times the energy of conventional attention modules.