CHORUS: Decentralized Multi-Embodiment Collaboration with One VLA Policy

📅 2026-06-10

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the coordination challenges in decentralized multi-robot collaboration arising from partial observability by proposing a novel approach that requires neither communication during inference nor a centralized architecture. Leveraging a pre-trained vision-language-action (VLA) model, the method introduces an agent identity prompting mechanism, enabling all robots to share a single policy and make independent decisions based solely on their local observations. It is the first to demonstrate that a unified VLA policy can achieve efficient coordination without inter-agent communication or agent-specific customization, thereby circumventing the conventional reliance on explicit alignment or information exchange. Experimental results show that the proposed approach improves task success by 64 percentage points over decentralized models trained from scratch in real-world settings and enhances responsiveness to teammates’ actions by 40 percentage points, even surpassing centralized baselines.

📝 Abstract

Multi-robot collaboration allows robots to efficiently take on a wide range of tasks, from moving a couch through a doorway to assembling structures on a construction site. However, achieving such coordination in mobile multi-robot settings remains challenging: centralized methods conditioned on the combined observations of a team scale poorly with team size, and decentralized methods that train one policy per robot often require explicit alignment procedures or information sharing at inference time to overcome partial observability. Our key insight is that the visuomotor priors of pretrained vision-language-action (VLA) models should enable reactive, decentralized collaboration from each robot's local observations alone, without these inference-time assumptions. We propose CHORUS, a framework that adapts a single VLA backbone to control diverse, multi-robot teams. At inference time, each robot runs an independent copy of CHORUS, conditioned only on its own observations and a robot-identifying prompt. In real-world experiments including mobile tape measurement, library book handovers, and laundry basket lifting, CHORUS achieves a 64% point improvement over decentralized, from-scratch models, improves reactivity to teammate behavior by 40% points, and outperforms centralized baselines. Together, these results show that a shared VLA backbone is capable of achieving decentralized multi-robot collaboration, without per-robot policies or inter-robot communication at inference.

Problem

Research questions and friction points this paper is trying to address.

multi-robot collaboration

decentralized control

partial observability

scalability

inference-time communication

Innovation

Methods, ideas, or system contributions that make the work stand out.

decentralized multi-robot collaboration

vision-language-action (VLA) models

shared policy backbone