MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation

📅 2024-11-26

🏛️ arXiv.org

📈 Citations: 6

✨ Influential: 0

career value

239K/year

🤖 AI Summary

Current LLM-driven robotic manipulation methods suffer from hallucination in long-horizon tasks and lack real-time feedback and adaptive re-planning due to open-loop, single-step planning. This paper proposes a zero-shot multi-agent LLM architecture that decouples high-level task planning from low-level control code generation, and introduces a dynamic coordination agent to enable vision-state-observed closed-loop decision-making, online re-planning, and failure recovery. The framework requires no pre-trained skills, in-context examples, or fine-tuning—only zero-shot prompting and iterative optimization via environmental feedback. Evaluated on nine long-horizon RLBench tasks, it achieves zero-shot successful execution, significantly improving robustness, generalization, and task completion rates. Our approach overcomes two critical bottlenecks in LLM-based robotic manipulation: hallucination mitigation and real-time adaptability.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have demonstrated remarkable planning abilities across various domains, including robotics manipulation and navigation. While recent efforts in robotics have leveraged LLMs both for high-level and low-level planning, these approaches often face significant challenges, such as hallucinations in long-horizon tasks and limited adaptability due to the generation of plans in a single pass without real-time feedback. To address these limitations, we propose a novel multi-agent LLM framework, Multi-Agent Large Language Model for Manipulation (MALMM) that distributes high-level planning and low-level control code generation across specialized LLM agents, supervised by an additional agent that dynamically manages transitions. By incorporating observations from the environment after each step, our framework effectively handles intermediate failures and enables adaptive re-planning. Unlike existing methods, our approach does not rely on pre-trained skill policies or in-context learning examples and generalizes to a variety of new tasks. We evaluate our approach on nine RLBench tasks, including long-horizon tasks, and demonstrate its ability to solve robotics manipulation in a zero-shot setting, thereby overcoming key limitations of existing LLM-based manipulation methods.

Problem

Research questions and friction points this paper is trying to address.

Addresses LLM hallucinations in long-horizon robotics tasks

Solves limited adaptability through real-time feedback integration

Enables zero-shot generalization without pre-trained skill policies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent LLM framework for robotics

Dynamic supervision with environmental feedback

Zero-shot generalization without pre-trained policies

🔎 Similar Papers

Wonderful Team: Zero-Shot Physical Task Planning with Visual LLMs