🤖 AI Summary
This work addresses the challenges of deploying computationally intensive video understanding models on mobile devices and the limitations of conventional spiking neural networks (SNNs) in temporal action detection (TAD), where long conversion timesteps often lead to significant performance degradation. To overcome these issues, we propose SpikeTAD, the first end-to-end SNN architecture specifically designed for TAD, which directly applies SNNs to this task without relying on rate-based conversion. By co-optimizing temporal encoding and network architecture, SpikeTAD effectively integrates the energy efficiency of neuromorphic computing with high action localization accuracy. Experiments demonstrate that SpikeTAD achieves mAP scores of 67.2% on THUMOS14 and 37.42% on ActivityNet-1.3, substantially reducing energy consumption while breaking through the performance bottleneck that has hindered SNNs in complex video understanding tasks.
📝 Abstract
Video understanding is a crucial part of computer vision, with numerous application scenarios. With the increasing popularity of mobile devices, an increasing number of efforts are trying to deploy video understanding models on them. However, existing video understanding models are difficult to deploy due to their large size and prohibitive power consumption. Spiking Neural Networks (SNNs) have shown bioplausibility and low power advantages over Artificial Neural Networks (ANNs), especially on neuromorphic chips which are regarded as essential components of future mobile devices. However, excessively long conversion time-steps and severe performance degradation problems limit their application. To solve the problems above, we explore the application of SNNs on temporal action detection (TAD), which is an important task in video understanding, and propose the first SNN-based end-to-end TAD architecture coined as SpikeTAD. While maintaining extremely low power consumption, SpikeTAD achieves an average mAP of 67.2% in THUMOS14 and 37.42% in ActivityNet-1.3, demonstrating the feasibility of a low-power TAD model. Our code is available at https://github.com/MCG-NJU/SpikeTAD.