Low-Cost and Real-Time Industrial Human Action Recognitions Based on Large-Scale Foundation Models

📅 2024-03-13
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF

career value

274K/year
🤖 AI Summary
Industrial Human Activity Recognition (IHAR) faces three critical bottlenecks: high deployment cost, poor cross-scenario generalization, and insufficient real-time performance. To address these, this work pioneers the synergistic integration of Large-Scale Foundation Models (LSFMs) and model lightweighting techniques into an end-to-end real-time IHAR framework. Specifically, self-supervised LSFMs generate high-quality pseudo-labels to drastically reduce manual annotation effort; a lightweight temporal feature extractor and an edge-cloud collaborative inference architecture jointly enable low-latency, high-accuracy recognition. Evaluated on real production lines, the framework reduces labeling cost by over 70%, achieves end-to-end latency under 50 ms, attains >92% accuracy, and supports cross-line transfer. Deployed across three distinct manufacturing scenarios, it breaks the conventional paradigm reliant on extensive manual labeling and high-compute infrastructure, delivering a scalable, production-ready solution for large-scale industrial deployment.

Technology Category

Application Category

📝 Abstract
Industrial managements, including quality control, cost and safety optimization, etc., heavily rely on high quality industrial human action recognitions (IHARs) which were hard to be implemented in large-scale industrial scenes due to their high costs and poor real-time performance. In this paper, we proposed a large-scale foundation model(LSFM)-based IHAR method, wherein various LSFMs and lightweight methods were jointly used, for the first time, to fulfill low-cost dataset establishment and real-time IHARs. Comprehensive tests on in-situ large-scale industrial manufacturing lines elucidated that the proposed method realized great reduction on employment costs, superior real-time performance, and satisfactory accuracy and generalization capabilities, indicating its great potential as a backbone IHAR method, especially for large-scale industrial applications.
Problem

Research questions and friction points this paper is trying to address.

High-cost deployment in industrial action recognition
Poor cross-scenario generalization for operational gestures
Limited real-time performance in industrial environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated dataset labeling using Grounding DINO
Real-time action detection with YOLOv5
LoRA-based fine-tuning for Vision Transformer classification
W
Wensheng Liang
School of Mechanical Engineering and Automation, Northeastern University, Wenhua Road, Heping District, Shenyang, 110819, Liaoning, China
R
Ruiyan Zhuang
Industrial AI Technology Research Institute, Midea Intelligent Manufacturing Research Center, Shunde District, Foshan, 528311, Guangdong, China
X
Xianwei Shi
Industrial AI Technology Research Institute, Midea Intelligent Manufacturing Research Center, Shunde District, Foshan, 528311, Guangdong, China
S
Shuai Li
Foshan Graduate School of Innovation, Northeastern University, Shunde District, Foshan, 528311, Guangdong, China
Z
Zhicheng Wang
College of Information Science and Engineering, Northeastern University, Wenhua Road, Heping District, Shenyang, 110819, Liaoning, China
X
Xiaoguang Ma
College of Information Science and Engineering, Northeastern University, Wenhua Road, Heping District, Shenyang, 110819, Liaoning, China