Centralized Permutation Equivariant Policy for Cooperative Multi-Agent Reinforcement Learning

📅 2025-08-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In multi-agent reinforcement learning (MARL), centralized training with decentralized execution (CTDE) suffers from performance degradation under partial observability, while fully centralized approaches face scalability bottlenecks. To address this, we propose Centralized Permutation-Equivariant Learning (CPEL), a lightweight framework featuring a global-local permutation-equivariant network that jointly enables centralized policy modeling and intrinsic symmetry handling with respect to agent permutations—supporting end-to-end centralized training and execution. CPEL integrates permutation-equivariant neural networks with value decomposition and an actor-critic architecture to explicitly model collaborative multi-agent structures. Evaluated on standard benchmarks—MPE, SMAC, and RWARE—CPEL consistently outperforms state-of-the-art CTDE methods and achieves new state-of-the-art performance on RWARE. By preserving expressive power while ensuring scalability, CPEL effectively bridges the trade-off between representational capacity and computational feasibility in MARL.

Technology Category

Application Category

📝 Abstract
The Centralized Training with Decentralized Execution (CTDE) paradigm has gained significant attention in multi-agent reinforcement learning (MARL) and is the foundation of many recent algorithms. However, decentralized policies operate under partial observability and often yield suboptimal performance compared to centralized policies, while fully centralized approaches typically face scalability challenges as the number of agents increases. We propose Centralized Permutation Equivariant (CPE) learning, a centralized training and execution framework that employs a fully centralized policy to overcome these limitations. Our approach leverages a novel permutation equivariant architecture, Global-Local Permutation Equivariant (GLPE) networks, that is lightweight, scalable, and easy to implement. Experiments show that CPE integrates seamlessly with both value decomposition and actor-critic methods, substantially improving the performance of standard CTDE algorithms across cooperative benchmarks including MPE, SMAC, and RWARE, and matching the performance of state-of-the-art RWARE implementations.
Problem

Research questions and friction points this paper is trying to address.

Overcoming partial observability in decentralized multi-agent policies
Addressing scalability challenges in centralized training approaches
Improving performance of cooperative multi-agent reinforcement learning algorithms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Centralized Permutation Equivariant learning framework
Global-Local Permutation Equivariant network architecture
Seamless integration with value decomposition methods
🔎 Similar Papers
No similar papers found.
Z
Zhuofan Xu
Université Paris-Saclay, CNRS, ENS Paris-Saclay, LMF, Gif-sur-Yvette, France
Benedikt Bollig
Benedikt Bollig
CNRS, LMF, ENS Paris-Saclay, Université Paris-Saclay
automata theorylogicconcurrency theoryartificial intelligencemachine learning
Matthias Függer
Matthias Függer
CNRS, LMF, ENS Paris-Saclay, Université Paris-Saclay
synthetic biologydistributed computingcircuitsmicrobiology
T
Thomas Nowak
Université Paris-Saclay, CNRS, ENS Paris-Saclay, LMF, Gif-sur-Yvette, France and Institut Universitaire de France, Gif-sur-Yvette, France
V
Vincent Le Dréau
Université Paris-Saclay, CNRS, ENS Paris-Saclay, LMF, Gif-sur-Yvette, France