Y-BotFrame: An Extensible Embodied Agent Framework for Quadruped Robot Assistants

📅 2026-06-11

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work proposes an embodied intelligence framework enabling quadrupedal robots to comprehend natural language instructions and autonomously execute tasks. Centered around a large language model, the framework integrates multimodal perception—including audio, vision, and LiDAR—to achieve environmental understanding, contextual reasoning, and task planning, thereby mapping natural language instructions end-to-end to executable actions. It introduces a novel teleoperation-free human-robot collaboration paradigm that supports voice-based interaction and visual feedback, and employs a plugin-based architecture to facilitate modular plug-and-play functionality and continuous system iteration. Evaluated in complex real-world environments, the system demonstrates effective human-robot collaboration and offers a reusable, scalable deployment paradigm for general-purpose, instruction-driven embodied agents.

📝 Abstract

Quadruped robots are capable of traversing a wide range of complex terrains with high flexibility. As highly mobile ground-based intelligent platforms, they can be equipped with modules for navigation control, environmental perception, and intelligent interaction, thereby serving as real-world mobile deployment platforms for various algorithms. In this paper, we introduce Y-BotFrame, an extensible embodied platform that turns a robot into an intelligent ground assistant. Y-BotFrame integrates multimodal perception capabilities, including speech, vision, and LiDAR, and employs a large language model as the cognitive core for environmental understanding, contextual reasoning, and task planning. The system maps user natural-language instructions into executable embodied task units that can be carried out by the robot. Y-BotFrame supports natural interaction through voice commands and visual feedback, removing the need for a remote controller and enabling efficient human-robot collaboration. With a highly extensible framework, Y-BotFrame supports plug-and-play integration of new functional modules as well as modular upgrades and iterative development, offering a reference implementation for the real-world deployment of general-purpose, instruction-driven embodied agents.The supplementary video is available at https://xdei-group.github.io/Y-BotFrame/.

Problem

Research questions and friction points this paper is trying to address.

quadruped robot

embodied agent

natural language instruction

multimodal perception

extensible framework

Innovation

Methods, ideas, or system contributions that make the work stand out.

embodied agent

large language model

multimodal perception