🤖 AI Summary
This work addresses the semantic grounding problem in embodied intelligence—specifically, the challenge of reliably mapping natural language instructions to physical robot actions in real-world environments. To overcome this fundamental limitation, we propose a cognitively inspired, tightly coupled architecture integrating robotic task learning with large language models (LLMs). Our method systematically identifies core grounding bottlenecks and innovatively unifies online interactive task learning (ITL) with LLM-based semantic understanding through a cognitive robot architecture, an LLM interface, and a multimodal semantic alignment mechanism. We implement and evaluate a scalable integrated prototype that closes the loop from natural language instruction to physical action execution. Experimental results demonstrate robust end-to-end performance across diverse manipulation tasks, validating the framework’s effectiveness in realistic settings. This work contributes both a reusable methodology and a concrete technical pathway toward natural, adaptive human–robot collaboration in unstructured physical environments.
📝 Abstract
A long-term goal of Artificial Intelligence is to build a language understanding system that allows a human to collaborate with a physical robot using language that is natural to the human. In this paper we highlight some of the challenges in doing this, and propose a solution that integrates the abilities of a cognitive agent capable of interactive task learning in a physical robot with the linguistic abilities of a large language model. We also point the way to an initial implementation of this approach.