🤖 AI Summary
This work addresses the challenge of deploying large language models (LLMs) in industrial human–robot collaboration, where their inherent non-determinism and lack of safety guarantees hinder reliable generation of executable robot commands. To overcome this limitation, the authors propose a syntax-constrained two-stage framework: first, a fine-tuned LLM coupled with a structured language model (SLM) translates natural language instructions into standardized JSON-formatted commands; second, a grammar parser integrated with a validation feedback loop enables automatic error correction and iterative refinement. By synergistically combining the semantic fluency of LLMs with grammar-driven structural constraints, the approach achieves state-of-the-art performance on the HuRIC dataset, significantly outperforming both pure API-based LLM prompting and conventional grammar-driven natural language understanding (NLU) models in terms of command validity, executability, and task success rate—thereby enhancing both safety and efficiency in industrial human–robot interaction.
📝 Abstract
Human-robot collaboration in industrial settings requires precise and reliable communication to enhance operational efficiency. While Large Language Models (LLMs) understand general language, they often lack the domain-specific rigidity needed for safe and executable industrial commands. To address this gap, this paper introduces a novel grammar-constrained LLM that integrates a grammar-driven Natural Language Understanding (NLU) system with a fine-tuned LLM, which enables both conversational flexibility and the deterministic precision required in robotics. Our method employs a two-stage process. First, a fine-tuned LLM performs high-level contextual reasoning and parameter inference on natural language inputs. Second, a Structured Language Model (SLM) and a grammar-based canonicalizer constrain the LLM's output, forcing it into a standardized symbolic format composed of valid action frames and command elements. This process guarantees that generated commands are valid and structured in a robot-readable JSON format. A key feature of the proposed model is a validation and feedback loop. A grammar parser validates the output against a predefined list of executable robotic actions. If a command is invalid, the system automatically generates corrective prompts and re-engages the LLM. This iterative self-correction mechanism allows the model to recover from initial interpretation errors to improve system robustness. We evaluate our grammar-constrained hybrid model against two baselines: a fine-tuned API-based LLM and a standalone grammar-driven NLU model. Using the Human Robot Interaction Corpus (HuRIC) dataset, we demonstrate that the hybrid approach achieves superior command validity, which promotes safer and more effective industrial human-robot collaboration.