MAST: Multi-Agent Spatial Transformer for Learning to Collaborate

📅 2025-09-21

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

Addressing the collaboration challenges in large-scale decentralized multi-robot systems (DC-MRS)—arising from local sensing, limited communication bandwidth, and absence of centralized coordination—this paper proposes the Multi-Agent Spatial Transformer (MAST), a decentralized Transformer architecture featuring windowed attention and a novel spatial positional encoding. MAST is explicitly designed to satisfy locality of computation, translation equivariance, and permutation equivariance, enabling both distributed training and fully decentralized deployment. By integrating imitation learning, distributed information aggregation, and local perception modeling, MAST achieves superior performance over existing baselines across canonical collaborative tasks—including task allocation, navigation, and coverage control. It demonstrates robustness to communication latency and scales efficiently to systems comprising hundreds of robots, thereby advancing the state of the art in scalable, decentralized multi-robot coordination.

Technology Category

Application Category

📝 Abstract

This article presents a novel multi-agent spatial transformer (MAST) for learning communication policies in large-scale decentralized and collaborative multi-robot systems (DC-MRS). Challenges in collaboration in DC-MRS arise from: (i) partial observable states as robots make only localized perception, (ii) limited communication range with no central server, and (iii) independent execution of actions. The robots need to optimize a common task-specific objective, which, under the restricted setting, must be done using a communication policy that exhibits the desired collaborative behavior. The proposed MAST is a decentralized transformer architecture that learns communication policies to compute abstract information to be shared with other agents and processes the received information with the robot's own observations. The MAST extends the standard transformer with new positional encoding strategies and attention operations that employ windowing to limit the receptive field for MRS. These are designed for local computation, shift-equivariance, and permutation equivariance, making it a promising approach for DC-MRS. We demonstrate the efficacy of MAST on decentralized assignment and navigation (DAN) and decentralized coverage control. Efficiently trained using imitation learning in a centralized setting, the decentralized MAST policy is robust to communication delays, scales to large teams, and performs better than the baselines and other learning-based approaches.

Problem

Research questions and friction points this paper is trying to address.

Addresses collaboration challenges in decentralized multi-robot systems with partial observability

Solves limited communication range issues without central server coordination

Handles independent action execution while optimizing common task objectives

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decentralized transformer architecture for multi-robot communication

Windowing attention with novel positional encoding strategies

Imitation learning for robust and scalable decentralized policy

🔎 Similar Papers

Multi-Agent Behavior Retrieval: Retrieval-Augmented Policy Training for Cooperative Push Manipulation by Mobile Robots