A Lightweight 3D-CNN for Event-Based Human Action Recognition with Privacy-Preserving Potential

📅 2025-11-05

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

To address privacy leakage in human action recognition caused by frame-based cameras, this work proposes a lightweight 3D-CNN method leveraging event-camera data. Exploiting the inherent property of event streams—sparsity and responsiveness only to pixel intensity changes—the approach avoids capturing identity-revealing visual content, thereby enabling privacy-preserving action recognition. Methodologically, we design a compact 3D convolutional architecture suitable for edge deployment; incorporate focal loss with class reweighting to mitigate long-tailed class distribution; and apply spatiotemporal data augmentation to enhance robustness. Evaluated on a composite event-based dataset, our model achieves 94.17% accuracy and an F1-score of 0.9415, outperforming baseline models—including C3D and ResNet3D—by over 3 percentage points. The proposed solution delivers high accuracy, low computational overhead, and strong privacy guarantees, making it particularly suitable for privacy-sensitive real-world applications.

Technology Category

Application Category

📝 Abstract

This paper presents a lightweight three-dimensional convolutional neural network (3DCNN) for human activity recognition (HAR) using event-based vision data. Privacy preservation is a key challenge in human monitoring systems, as conventional frame-based cameras capture identifiable personal information. In contrast, event cameras record only changes in pixel intensity, providing an inherently privacy-preserving sensing modality. The proposed network effectively models both spatial and temporal dynamics while maintaining a compact design suitable for edge deployment. To address class imbalance and enhance generalization, focal loss with class reweighting and targeted data augmentation strategies are employed. The model is trained and evaluated on a composite dataset derived from the Toyota Smart Home and ETRI datasets. Experimental results demonstrate an F1-score of 0.9415 and an overall accuracy of 94.17%, outperforming benchmark 3D-CNN architectures such as C3D, ResNet3D, and MC3_18 by up to 3%. These results highlight the potential of event-based deep learning for developing accurate, efficient, and privacy-aware human action recognition systems suitable for real-world edge applications.

Problem

Research questions and friction points this paper is trying to address.

Develops privacy-preserving human action recognition using event cameras

Addresses class imbalance with focal loss and data augmentation

Enables edge deployment with lightweight 3D-CNN for real-world applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight 3D-CNN for event-based human action recognition

Uses focal loss and data augmentation for generalization

Compact design suitable for edge deployment applications

🔎 Similar Papers

LangPose: Language-Aligned Motion for Robust 3D Human Pose Estimation

2024-08-31Citations: 0

Bosch Group

Hildesheim, NDS, DE

Research Scientist Intern, Machine Perception for Input and Interaction (PhD)