π€ AI Summary
To address the limited long-term temporal modeling capability of Spiking Neural Networks (SNNs) in event-camera-based human action recognition, this paper proposes a dual-path framework comprising TS-SNN (Temporal Segmentation SNN) and 3D-SNN (3D-convolutional SNN), the first systematic approach to modeling long-range temporal dynamics within SNNs. We design a spatiotemporal joint encoding mechanism that integrates event-stream processing with 3D convolution for efficient spatiotemporal feature extraction. Furthermore, we introduce FallingDetection-CeleXβthe first high-resolution event-based dataset specifically designed for fall detection. Extensive experiments on FallingDetection-CeleX and three benchmark neuromorphic datasets (NMNIST, DVS128 Gesture, ASL-DVS) demonstrate that our method consistently outperforms existing SNN approaches, achieving average accuracy improvements of 5.2%β9.7% for long-sequence action recognition.
π Abstract
This paper explores the promising interplay between spiking neural networks (SNNs) and event-based cameras for privacy-preserving human action recognition (HAR). The unique feature of event cameras in capturing only the outlines of motion, combined with SNNs' proficiency in processing spatiotemporal data through spikes, establishes a highly synergistic compatibility for event-based HAR. Previous studies, however, have been limited by SNNs' ability to process long-term temporal information, essential for precise HAR. In this paper, we introduce two novel frameworks to address this: temporal segment-based SNN ( extit{TS-SNN}) and 3D convolutional SNN ( extit{3D-SNN}). The extit{TS-SNN} extracts long-term temporal information by dividing actions into shorter segments, while the extit{3D-SNN} replaces 2D spatial elements with 3D components to facilitate the transmission of temporal information. To promote further research in event-based HAR, we create a dataset, extit{FallingDetection-CeleX}, collected using the high-resolution CeleX-V event camera $(1280 imes 800)$, comprising 7 distinct actions. Extensive experimental results show that our proposed frameworks surpass state-of-the-art SNN methods on our newly collected dataset and three other neuromorphic datasets, showcasing their effectiveness in handling long-range temporal information for event-based HAR.