🤖 AI Summary
This work addresses the challenges industrial robots face in flexible automation—namely, accurately interpreting operator intent, verifying physical feasibility, and effectively recovering from execution failures—by proposing a human-robot collaborative neuro-symbolic framework. The approach integrates large language models for natural language understanding and contextual reasoning, augmented with an innovative SDI architecture inspired by the software engineering PGE pattern, LangGraph-based dynamic routing, and a two-tier recovery mechanism to enable precise structural-level replanning and geometric-level failure handling. A Unity3D digital twin further supports human-in-the-loop validation and correction. Experimental results demonstrate that the method significantly outperforms ten baseline approaches across natural language instructions of varying complexity, achieving the highest task success rate, while ablation studies confirm the effectiveness and necessity of each core component.
📝 Abstract
Flexible robotic automation requires systems that interpret operator intent, verify physical feasibility, and recover from execution failures across both the planning and execution stages. This paper proposes an agentic neuro-symbolic framework for human-in-the-loop industrial robotics, in which LLMs are used for tasks that require language understanding or contextual reasoning, while all verification, sequencing, and execution remain deterministic. The framework adapts the Planner-Generator-Evaluator (PGE) harness pattern from software engineering into a Specifier-Designer-Inspector (SDI) architecture for industrial robotics, combined with LangGraph-based dynamic routing for failure recovery. A two-tier recovery mechanism addresses structure-level replanning through context-aware orchestration and execution-level geometric failures through deterministic recovery skills. A Unity3D digital twin supports human inspection, modification, and re-verification prior to physical execution. Evaluated on natural-language commands across multiple difficulty levels against ten baselines, the proposed method achieves the highest task success. Ablation results confirm that structured command expansion, symbolic verification, selective LLM routing, and recovery skills are each individually necessary.