🤖 AI Summary
To address the bottleneck in multi-agent imitation learning—namely, reliance on synchronized joint demonstrations—this paper proposes Round-Robin Behavior Cloning (RR-BC). RR-BC enables a single human operator to collect asynchronous demonstration data by temporally alternating control over agents one at a time, eliminating the need for joint-action space annotation. Built upon the behavior cloning framework, it integrates serialized demonstration modeling with decentralized policy training, enabling cooperative policy learning while respecting individual observation constraints. Evaluated on four simulated tasks, RR-BC matches or surpasses synchronous-demonstration baselines; it further succeeds in two real-robot collaborative tasks, demonstrating strong generalization and practical applicability. The core contribution is the first systematic solution to the challenge of learning from asynchronous, single-operator multi-agent demonstrations—significantly lowering the data collection barrier for multi-agent imitation learning.
📝 Abstract
Imitation Learning (IL) is a natural way for humans to teach robots, particularly when high-quality demonstrations are easy to obtain. While IL has been widely applied to single-robot settings, relatively few studies have addressed the extension of these methods to multi-agent systems, especially in settings where a single human must provide demonstrations to a team of collaborating robots. In this paper, we introduce and study Round-Robin Behavior Cloning (R2BC), a method that enables a single human operator to effectively train multi-robot systems through sequential, single-agent demonstrations. Our approach allows the human to teleoperate one agent at a time and incrementally teach multi-agent behavior to the entire system, without requiring demonstrations in the joint multi-agent action space. We show that R2BC methods match, and in some cases surpass, the performance of an oracle behavior cloning approach trained on privileged synchronized demonstrations across four multi-agent simulated tasks. Finally, we deploy R2BC on two physical robot tasks trained using real human demonstrations.