Decentralized Transformers with Centralized Aggregation are Sample-Efficient Multi-Agent World Models

📅 2024-06-22
🏛️ Trans. Mach. Learn. Res.
📈 Citations: 4
Influential: 1
📄 PDF
🤖 AI Summary
To address the poor scalability of centralized architectures and severe non-stationarity in decentralized architectures within multi-agent reinforcement learning (MARL), this paper proposes the first world model framework specifically designed for MARL. Methodologically, it introduces a dual-path architecture: “decentralized local modeling + centralized representation aggregation.” Each agent employs a lightweight Transformer to independently model local dynamics, ensuring scalability; meanwhile, a Perceiver Transformer enables efficient global state aggregation to mitigate environmental non-stationarity. Innovatively, this work is the first to systematically apply Transformers to MARL world modeling, incorporating discrete tokenization and autoregressive sequence modeling. Evaluated on the SMAC benchmark, our approach substantially outperforms mainstream model-free and model-based baselines—achieving up to a 3.2× improvement in sample efficiency and a 12.7% increase in final win rate.

Technology Category

Application Category

📝 Abstract
Learning a world model for model-free Reinforcement Learning (RL) agents can significantly improve the sample efficiency by learning policies in imagination. However, building a world model for Multi-Agent RL (MARL) can be particularly challenging due to the scalability issue in a centralized architecture arising from a large number of agents, and also the non-stationarity issue in a decentralized architecture stemming from the inter-dependency among agents. To address both challenges, we propose a novel world model for MARL that learns decentralized local dynamics for scalability, combined with a centralized representation aggregation from all agents. We cast the dynamics learning as an auto-regressive sequence modeling problem over discrete tokens by leveraging the expressive Transformer architecture, in order to model complex local dynamics across different agents and provide accurate and consistent long-term imaginations. As the first pioneering Transformer-based world model for multi-agent systems, we introduce a Perceiver Transformer as an effective solution to enable centralized representation aggregation within this context. Results on Starcraft Multi-Agent Challenge (SMAC) show that it outperforms strong model-free approaches and existing model-based methods in both sample efficiency and overall performance.
Problem

Research questions and friction points this paper is trying to address.

Addresses scalability and non-stationarity in multi-agent world models
Proposes decentralized local dynamics with centralized aggregation
Improves sample efficiency for multi-agent reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decentralized local dynamics learning for scalability
Centralized representation aggregation from all agents
Perceiver Transformer enabling centralized aggregation
🔎 Similar Papers
No similar papers found.