Federated Action Recognition for Smart Worker Assistance Using FastPose

📅 2025-08-18

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

To address the dual challenges of privacy preservation, real-time inference, and cross-user generalization in skeleton-based action recognition for intelligent manufacturing, this paper proposes the first federated skeleton action recognition framework tailored to industrial scenarios. Methodologically, it integrates an enhanced FastPose pose estimator with an LSTM-Transformer hybrid temporal modeling architecture and pioneers the application of Federated Ensemble Learning (FedEnsemble) to this task, synergizing weighted FedAvg for privacy-preserving multi-client collaborative training. Experimental results demonstrate that, on the global test set, the federated Transformer and FedEnsemble achieve 12.4% and 16.3% higher accuracy than centralized training, respectively. More notably, their zero-shot generalization to unseen clients improves by 52.6% and 58.3%, significantly mitigating model transfer difficulties in heterogeneous industrial environments.

Technology Category

Application Category

📝 Abstract

In smart manufacturing environments, accurate and real-time recognition of worker actions is essential for productivity, safety, and human-machine collaboration. While skeleton-based human activity recognition (HAR) offers robustness to lighting, viewpoint, and background variations, most existing approaches rely on centralized datasets, which are impractical in privacy-sensitive industrial scenarios. This paper presents a federated learning (FL) framework for pose-based HAR using a custom skeletal dataset of eight industrially relevant upper-body gestures, captured from five participants and processed using a modified FastPose model. Two temporal backbones, an LSTM and a Transformer encoder, are trained and evaluated under four paradigms: centralized, local (per-client), FL with weighted federated averaging (FedAvg), and federated ensemble learning (FedEnsemble). On the global test set, the FL Transformer improves over centralized training by +12.4 percentage points, with FedEnsemble delivering a +16.3 percentage points gain. On an unseen external client, FL and FedEnsemble exceed centralized accuracy by +52.6 and +58.3 percentage points, respectively. These results demonstrate that FL not only preserves privacy but also substantially enhances cross-user generalization, establishing it as a practical solution for scalable, privacy-aware HAR in heterogeneous industrial settings.

Problem

Research questions and friction points this paper is trying to address.

Federated learning for privacy-preserving worker action recognition

Skeleton-based HAR in industrial settings without centralized data

Improving cross-user generalization in human activity recognition

Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated learning for privacy-preserving action recognition

FastPose model with LSTM and Transformer backbones

Federated ensemble learning enhances cross-user generalization

🔎 Similar Papers

C3T: Cross-modal Transfer Through Time for Human Action Recognition