🤖 AI Summary
This study addresses the lack of systematic evaluation of robotic foundation models (RFMs) for industrial applications, where critical requirements such as safety, real-time performance, heterogeneous perception, and edge deployment remain inadequately assessed. To bridge this gap, the work proposes a comprehensive evaluation framework that structures industrial deployment needs into 11 influencing factors and 149 specific criteria, enabling large-scale benchmarking of 324 action-capable RFMs. The authors introduce an innovative assessment pipeline that integrates conservative, large language model–assisted scoring with expert validation to achieve multidimensional, fine-grained capability quantification. Findings reveal that current RFMs exhibit limited and uneven industrial maturity: even the best-performing models only partially satisfy requirements and generally lack cross-dimensional coordination. The results underscore an urgent need for systematic integration of safety, real-time responsiveness, and cost-effectiveness in future RFM development.
📝 Abstract
Robotic foundation models (RFMs) are emerging as a promising route towards flexible, instruction- and demonstration-driven robot control, however, a critical investigation of their industrial applicability is still lacking. This survey gives an extensive overview over the RFM-landscape and analyses, driven by concrete implications, how industrial domains and use cases shape the requirements of RFMs, with particular focus on collaborative robot platforms, heterogeneous sensing and actuation, edge-computing constraints, and safety-critical operation. We synthesise industrial deployment perspectives into eleven interdependent implications and operationalise them into an assessment framework comprising a catalogue of 149 concrete criteria, spanning both model capabilities and ecosystem requirements. Using this framework, we evaluate 324 manipulation-capable RFMs via 48,276 criterion-level decisions obtained via a conservative LLM-assisted evaluation pipeline, validated against expert judgements. The results indicate that industrial maturity is limited and uneven: even the highest-rated models satisfy only a fraction of criteria and typically exhibit narrow implication-specific peaks rather than integrated coverage. We conclude that progress towards industry-grade RFMs depends less on isolated benchmark successes than on systematic incorporation of safety, real-time feasibility, robust perception, interaction, and cost-effective system integration into auditable deployment stacks.