π€ AI Summary
Current LLM-driven robotic manipulation methods suffer from hallucination in long-horizon tasks and lack real-time feedback and adaptive re-planning due to open-loop, single-step planning. This paper proposes a zero-shot multi-agent LLM architecture that decouples high-level task planning from low-level control code generation, and introduces a dynamic coordination agent to enable vision-state-observed closed-loop decision-making, online re-planning, and failure recovery. The framework requires no pre-trained skills, in-context examples, or fine-tuningβonly zero-shot prompting and iterative optimization via environmental feedback. Evaluated on nine long-horizon RLBench tasks, it achieves zero-shot successful execution, significantly improving robustness, generalization, and task completion rates. Our approach overcomes two critical bottlenecks in LLM-based robotic manipulation: hallucination mitigation and real-time adaptability.
π Abstract
Large Language Models (LLMs) have demonstrated remarkable planning abilities across various domains, including robotics manipulation and navigation. While recent efforts in robotics have leveraged LLMs both for high-level and low-level planning, these approaches often face significant challenges, such as hallucinations in long-horizon tasks and limited adaptability due to the generation of plans in a single pass without real-time feedback. To address these limitations, we propose a novel multi-agent LLM framework, Multi-Agent Large Language Model for Manipulation (MALMM) that distributes high-level planning and low-level control code generation across specialized LLM agents, supervised by an additional agent that dynamically manages transitions. By incorporating observations from the environment after each step, our framework effectively handles intermediate failures and enables adaptive re-planning. Unlike existing methods, our approach does not rely on pre-trained skill policies or in-context learning examples and generalizes to a variety of new tasks. We evaluate our approach on nine RLBench tasks, including long-horizon tasks, and demonstrate its ability to solve robotics manipulation in a zero-shot setting, thereby overcoming key limitations of existing LLM-based manipulation methods.