🤖 AI Summary
To address challenges in embodied AI—including difficult multi-agent system development, poor cross-platform portability, and misalignment between large language models (LLMs) and physical execution—this paper proposes RAI, a flexible multi-agent framework. Methodologically, RAI introduces: (1) a unified agent encapsulation mechanism tailored for embodied intelligence, enabling seamless integration of LLMs, ROS 2 robotics middleware, and diverse simulation environments (e.g., dual-arm manipulators, agricultural machinery, and ROSBot XL digital twins); (2) an LLM interface abstraction layer, a digital twin synchronization mechanism, and a lightweight multi-agent communication protocol; and (3) the first simulation-based embodied multi-task evaluation benchmark. Experimental results demonstrate RAI’s effectiveness across real robots and two major simulation platforms, supporting high-precision motion control, real-time perception–action response, and collaborative decision-making. Crucially, RAI identifies and mitigates key LLM limitations in embodied reasoning, temporal planning, and action grounding.
📝 Abstract
With an increase in the capabilities of generative language models, a growing interest in embodied AI has followed. This contribution introduces RAI - a framework for creating embodied Multi Agent Systems for robotics. The proposed framework implements tools for Agents' integration with robotic stacks, Large Language Models, and simulations. It provides out-of-the-box integration with state-of-the-art systems like ROS 2. It also comes with dedicated mechanisms for the embodiment of Agents. These mechanisms have been tested on a physical robot, Husarion ROSBot XL, which was coupled with its digital twin, for rapid prototyping. Furthermore, these mechanisms have been deployed in two simulations: (1) robot arm manipulator and (2) tractor controller. All of these deployments have been evaluated in terms of their control capabilities, effectiveness of embodiment, and perception ability. The proposed framework has been used successfully to build systems with multiple agents. It has demonstrated effectiveness in all the aforementioned tasks. It also enabled identifying and addressing the shortcomings of the generative models used for embodied AI.