🤖 AI Summary
Industrial Human Activity Recognition (IHAR) faces three critical bottlenecks: high deployment cost, poor cross-scenario generalization, and insufficient real-time performance. To address these, this work pioneers the synergistic integration of Large-Scale Foundation Models (LSFMs) and model lightweighting techniques into an end-to-end real-time IHAR framework. Specifically, self-supervised LSFMs generate high-quality pseudo-labels to drastically reduce manual annotation effort; a lightweight temporal feature extractor and an edge-cloud collaborative inference architecture jointly enable low-latency, high-accuracy recognition. Evaluated on real production lines, the framework reduces labeling cost by over 70%, achieves end-to-end latency under 50 ms, attains >92% accuracy, and supports cross-line transfer. Deployed across three distinct manufacturing scenarios, it breaks the conventional paradigm reliant on extensive manual labeling and high-compute infrastructure, delivering a scalable, production-ready solution for large-scale industrial deployment.
📝 Abstract
Industrial managements, including quality control, cost and safety optimization, etc., heavily rely on high quality industrial human action recognitions (IHARs) which were hard to be implemented in large-scale industrial scenes due to their high costs and poor real-time performance. In this paper, we proposed a large-scale foundation model(LSFM)-based IHAR method, wherein various LSFMs and lightweight methods were jointly used, for the first time, to fulfill low-cost dataset establishment and real-time IHARs. Comprehensive tests on in-situ large-scale industrial manufacturing lines elucidated that the proposed method realized great reduction on employment costs, superior real-time performance, and satisfactory accuracy and generalization capabilities, indicating its great potential as a backbone IHAR method, especially for large-scale industrial applications.