🤖 AI Summary
This work proposes an embodied intelligence framework enabling quadrupedal robots to comprehend natural language instructions and autonomously execute tasks. Centered around a large language model, the framework integrates multimodal perception—including audio, vision, and LiDAR—to achieve environmental understanding, contextual reasoning, and task planning, thereby mapping natural language instructions end-to-end to executable actions. It introduces a novel teleoperation-free human-robot collaboration paradigm that supports voice-based interaction and visual feedback, and employs a plugin-based architecture to facilitate modular plug-and-play functionality and continuous system iteration. Evaluated in complex real-world environments, the system demonstrates effective human-robot collaboration and offers a reusable, scalable deployment paradigm for general-purpose, instruction-driven embodied agents.
📝 Abstract
Quadruped robots are capable of traversing a wide range of complex terrains with high flexibility. As highly mobile ground-based intelligent platforms, they can be equipped with modules for navigation control, environmental perception, and intelligent interaction, thereby serving as real-world mobile deployment platforms for various algorithms. In this paper, we introduce Y-BotFrame, an extensible embodied platform that turns a robot into an intelligent ground assistant. Y-BotFrame integrates multimodal perception capabilities, including speech, vision, and LiDAR, and employs a large language model as the cognitive core for environmental understanding, contextual reasoning, and task planning. The system maps user natural-language instructions into executable embodied task units that can be carried out by the robot. Y-BotFrame supports natural interaction through voice commands and visual feedback, removing the need for a remote controller and enabling efficient human-robot collaboration. With a highly extensible framework, Y-BotFrame supports plug-and-play integration of new functional modules as well as modular upgrades and iterative development, offering a reference implementation for the real-world deployment of general-purpose, instruction-driven embodied agents.The supplementary video is available at https://xdei-group.github.io/Y-BotFrame/.