🤖 AI Summary
To address privacy leakage in human action recognition caused by frame-based cameras, this work proposes a lightweight 3D-CNN method leveraging event-camera data. Exploiting the inherent property of event streams—sparsity and responsiveness only to pixel intensity changes—the approach avoids capturing identity-revealing visual content, thereby enabling privacy-preserving action recognition. Methodologically, we design a compact 3D convolutional architecture suitable for edge deployment; incorporate focal loss with class reweighting to mitigate long-tailed class distribution; and apply spatiotemporal data augmentation to enhance robustness. Evaluated on a composite event-based dataset, our model achieves 94.17% accuracy and an F1-score of 0.9415, outperforming baseline models—including C3D and ResNet3D—by over 3 percentage points. The proposed solution delivers high accuracy, low computational overhead, and strong privacy guarantees, making it particularly suitable for privacy-sensitive real-world applications.
📝 Abstract
This paper presents a lightweight three-dimensional convolutional neural network (3DCNN) for human activity recognition (HAR) using event-based vision data. Privacy preservation is a key challenge in human monitoring systems, as conventional frame-based cameras capture identifiable personal information. In contrast, event cameras record only changes in pixel intensity, providing an inherently privacy-preserving sensing modality. The proposed network effectively models both spatial and temporal dynamics while maintaining a compact design suitable for edge deployment. To address class imbalance and enhance generalization, focal loss with class reweighting and targeted data augmentation strategies are employed. The model is trained and evaluated on a composite dataset derived from the Toyota Smart Home and ETRI datasets. Experimental results demonstrate an F1-score of 0.9415 and an overall accuracy of 94.17%, outperforming benchmark 3D-CNN architectures such as C3D, ResNet3D, and MC3_18 by up to 3%. These results highlight the potential of event-based deep learning for developing accurate, efficient, and privacy-aware human action recognition systems suitable for real-world edge applications.