Temporal-Guided Spiking Neural Networks for Event-Based Human Action Recognition

📅 2025-03-21

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the limited long-term temporal modeling capability of Spiking Neural Networks (SNNs) in event-camera-based human action recognition, this paper proposes a dual-path framework comprising TS-SNN (Temporal Segmentation SNN) and 3D-SNN (3D-convolutional SNN), the first systematic approach to modeling long-range temporal dynamics within SNNs. We design a spatiotemporal joint encoding mechanism that integrates event-stream processing with 3D convolution for efficient spatiotemporal feature extraction. Furthermore, we introduce FallingDetection-CeleX—the first high-resolution event-based dataset specifically designed for fall detection. Extensive experiments on FallingDetection-CeleX and three benchmark neuromorphic datasets (NMNIST, DVS128 Gesture, ASL-DVS) demonstrate that our method consistently outperforms existing SNN approaches, achieving average accuracy improvements of 5.2%–9.7% for long-sequence action recognition.

Technology Category

Application Category

📝 Abstract

This paper explores the promising interplay between spiking neural networks (SNNs) and event-based cameras for privacy-preserving human action recognition (HAR). The unique feature of event cameras in capturing only the outlines of motion, combined with SNNs' proficiency in processing spatiotemporal data through spikes, establishes a highly synergistic compatibility for event-based HAR. Previous studies, however, have been limited by SNNs' ability to process long-term temporal information, essential for precise HAR. In this paper, we introduce two novel frameworks to address this: temporal segment-based SNN ( extit{TS-SNN}) and 3D convolutional SNN ( extit{3D-SNN}). The extit{TS-SNN} extracts long-term temporal information by dividing actions into shorter segments, while the extit{3D-SNN} replaces 2D spatial elements with 3D components to facilitate the transmission of temporal information. To promote further research in event-based HAR, we create a dataset, extit{FallingDetection-CeleX}, collected using the high-resolution CeleX-V event camera $(1280 imes 800)$, comprising 7 distinct actions. Extensive experimental results show that our proposed frameworks surpass state-of-the-art SNN methods on our newly collected dataset and three other neuromorphic datasets, showcasing their effectiveness in handling long-range temporal information for event-based HAR.

Problem

Research questions and friction points this paper is trying to address.

Enhancing SNNs for long-term temporal action recognition

Integrating event cameras with SNNs for privacy-preserving HAR

Developing new SNN frameworks to process spatiotemporal data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Temporal segment-based SNN for long-term action recognition

3D convolutional SNN for temporal information transmission

High-resolution event camera dataset for HAR validation

🔎 Similar Papers

No similar papers found.

Authors to Follow