🤖 AI Summary
This work addresses the scalability, robustness, and generalization limitations in multi-agent reinforcement learning that arise from reliance on global state information—particularly the fragility observed under dynamic team compositions or environmental changes. To overcome these challenges, the authors propose a fully decentralized coordination framework that eschews all privileged centralized information, relying instead solely on local observations and peer-to-peer multi-hop communication for collaborative decision-making. The key innovations include a Distributed Graph Attention Network (D-GAT) for implicit global state inference and a novel Distributed Graph Attention MAPPO (DG-MAPPO) algorithm based on local policies and value functions. Experimental results demonstrate that the proposed method significantly outperforms state-of-the-art CTDE approaches across multiple benchmarks—including StarCraftII, Google Research Football, and Multi-Agent MuJoCo—and is effective for both homogeneous and heterogeneous agent teams.
📝 Abstract
Centralized training with decentralized execution (CTDE) has been the dominant paradigm in multi-agent reinforcement learning (MARL), but its reliance on global state information during training introduces scalability, robustness, and generalization bottlenecks. Moreover, in practical scenarios such as adding/dropping teammates or facing environment dynamics that differ from the training, CTDE methods can be brittle and costly to retrain, whereas distributed approaches allow agents to adapt using only local information and peer-to-peer communication. We present a distributed MARL framework that removes the need for centralized critics or global information. Firstly, we develop a novel Distributed Graph Attention Network (D-GAT) that performs global state inference through multi-hop communication, where agents integrate neighbor features via input-dependent attention weights in a fully distributed manner. Leveraging D-GAT, we develop the distributed graph-attention MAPPO (DG-MAPPO) -- a distributed MARL framework where agents optimize local policies and value functions using local observations, multi-hop communication, and shared/averaged rewards. Empirical evaluation on the StarCraftII Multi-Agent Challenge, Google Research Football, and Multi-Agent Mujoco demonstrates that our method consistently outperforms strong CTDE baselines, achieving superior coordination across a wide range of cooperative tasks with both homogeneous and heterogeneous teams. Our distributed MARL framework provides a principled and scalable solution for robust collaboration, eliminating the need for centralized training or global observability. To the best of our knowledge, DG-MAPPO appears to be the first to fully eliminate reliance on privileged centralized information, enabling agents to learn and act solely through peer-to-peer communication.