Temporal-Guided Spiking Neural Networks for Event-Based Human Action Recognition

πŸ“… 2025-03-21
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the limited long-term temporal modeling capability of Spiking Neural Networks (SNNs) in event-camera-based human action recognition, this paper proposes a dual-path framework comprising TS-SNN (Temporal Segmentation SNN) and 3D-SNN (3D-convolutional SNN), the first systematic approach to modeling long-range temporal dynamics within SNNs. We design a spatiotemporal joint encoding mechanism that integrates event-stream processing with 3D convolution for efficient spatiotemporal feature extraction. Furthermore, we introduce FallingDetection-CeleXβ€”the first high-resolution event-based dataset specifically designed for fall detection. Extensive experiments on FallingDetection-CeleX and three benchmark neuromorphic datasets (NMNIST, DVS128 Gesture, ASL-DVS) demonstrate that our method consistently outperforms existing SNN approaches, achieving average accuracy improvements of 5.2%–9.7% for long-sequence action recognition.

Technology Category

Application Category

πŸ“ Abstract
This paper explores the promising interplay between spiking neural networks (SNNs) and event-based cameras for privacy-preserving human action recognition (HAR). The unique feature of event cameras in capturing only the outlines of motion, combined with SNNs' proficiency in processing spatiotemporal data through spikes, establishes a highly synergistic compatibility for event-based HAR. Previous studies, however, have been limited by SNNs' ability to process long-term temporal information, essential for precise HAR. In this paper, we introduce two novel frameworks to address this: temporal segment-based SNN ( extit{TS-SNN}) and 3D convolutional SNN ( extit{3D-SNN}). The extit{TS-SNN} extracts long-term temporal information by dividing actions into shorter segments, while the extit{3D-SNN} replaces 2D spatial elements with 3D components to facilitate the transmission of temporal information. To promote further research in event-based HAR, we create a dataset, extit{FallingDetection-CeleX}, collected using the high-resolution CeleX-V event camera $(1280 imes 800)$, comprising 7 distinct actions. Extensive experimental results show that our proposed frameworks surpass state-of-the-art SNN methods on our newly collected dataset and three other neuromorphic datasets, showcasing their effectiveness in handling long-range temporal information for event-based HAR.
Problem

Research questions and friction points this paper is trying to address.

Enhancing SNNs for long-term temporal action recognition
Integrating event cameras with SNNs for privacy-preserving HAR
Developing new SNN frameworks to process spatiotemporal data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Temporal segment-based SNN for long-term action recognition
3D convolutional SNN for temporal information transmission
High-resolution event camera dataset for HAR validation
πŸ”Ž Similar Papers
No similar papers found.
Siyuan Yang
Siyuan Yang
Wallenberg-NTU Presidential Postdoctoral Fellowship, Nanyang Technological University
Computer VisionAction Recognition
Shilin Lu
Shilin Lu
Nanyang Technological University
Generative Models
S
Shizheng Wang
Institute of Microelectronics, Chinese Academy of Sciences, China
M
Meng Hwa Er
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore
Z
Zengwei Zheng
Department of Computer Science and Computing, Zhejiang University City College, China
A
A.C. Kot
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore