Low-Cost and Real-Time Industrial Human Action Recognitions Based on Large-Scale Foundation Models

📅 2024-03-13

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

274K/year

🤖 AI Summary

Industrial Human Activity Recognition (IHAR) faces three critical bottlenecks: high deployment cost, poor cross-scenario generalization, and insufficient real-time performance. To address these, this work pioneers the synergistic integration of Large-Scale Foundation Models (LSFMs) and model lightweighting techniques into an end-to-end real-time IHAR framework. Specifically, self-supervised LSFMs generate high-quality pseudo-labels to drastically reduce manual annotation effort; a lightweight temporal feature extractor and an edge-cloud collaborative inference architecture jointly enable low-latency, high-accuracy recognition. Evaluated on real production lines, the framework reduces labeling cost by over 70%, achieves end-to-end latency under 50 ms, attains >92% accuracy, and supports cross-line transfer. Deployed across three distinct manufacturing scenarios, it breaks the conventional paradigm reliant on extensive manual labeling and high-compute infrastructure, delivering a scalable, production-ready solution for large-scale industrial deployment.

Technology Category

Application Category

📝 Abstract

Industrial managements, including quality control, cost and safety optimization, etc., heavily rely on high quality industrial human action recognitions (IHARs) which were hard to be implemented in large-scale industrial scenes due to their high costs and poor real-time performance. In this paper, we proposed a large-scale foundation model(LSFM)-based IHAR method, wherein various LSFMs and lightweight methods were jointly used, for the first time, to fulfill low-cost dataset establishment and real-time IHARs. Comprehensive tests on in-situ large-scale industrial manufacturing lines elucidated that the proposed method realized great reduction on employment costs, superior real-time performance, and satisfactory accuracy and generalization capabilities, indicating its great potential as a backbone IHAR method, especially for large-scale industrial applications.

Problem

Research questions and friction points this paper is trying to address.

High-cost deployment in industrial action recognition

Poor cross-scenario generalization for operational gestures

Limited real-time performance in industrial environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated dataset labeling using Grounding DINO

Real-time action detection with YOLOv5

LoRA-based fine-tuning for Vision Transformer classification

🔎 Similar Papers

C3T: Cross-modal Transfer Through Time for Human Action Recognition