🤖 AI Summary
To address coordination challenges arising from partial observability in multi-agent lifelong pathfinding, this paper proposes the Shared Cyclic Memory Transformer (SCM-Transformer). The method innovatively embeds a shared recurrent memory mechanism into the Transformer architecture, enabling pooling and global broadcasting of working memory across agents—facilitating implicit information exchange and coordination in decentralized settings without explicit interaction modeling. Formulated within the MARL and POMDP frameworks, the architecture integrates distributed policy networks with Transformer-based reinforcement learning. Experiments demonstrate that SCM-Transformer significantly outperforms diverse RL baselines—especially under sparse rewards—in the Bottleneck narrow-corridor task, while generalizing to longer unseen corridors. On the POGEMA benchmark (Mazes/Random/MovingAI), it matches state-of-the-art MARL methods and classical planning algorithms. The core contribution is the first memory-sharing-driven implicit coordination architecture based on Transformers, advancing decentralized multi-agent learning under partial observability.
📝 Abstract
Multi-agent reinforcement learning (MARL) demonstrates significant progress in solving cooperative and competitive multi-agent problems in various environments. One of the principal challenges in MARL is the need for explicit prediction of the agents' behavior to achieve cooperation. To resolve this issue, we propose the Shared Recurrent Memory Transformer (SRMT) which extends memory transformers to multi-agent settings by pooling and globally broadcasting individual working memories, enabling agents to exchange information implicitly and coordinate their actions. We evaluate SRMT on the Partially Observable Multi-Agent Pathfinding problem in a toy Bottleneck navigation task that requires agents to pass through a narrow corridor and on a POGEMA benchmark set of tasks. In the Bottleneck task, SRMT consistently outperforms a variety of reinforcement learning baselines, especially under sparse rewards, and generalizes effectively to longer corridors than those seen during training. On POGEMA maps, including Mazes, Random, and MovingAI, SRMT is competitive with recent MARL, hybrid, and planning-based algorithms. These results suggest that incorporating shared recurrent memory into the transformer-based architectures can enhance coordination in decentralized multi-agent systems. The source code for training and evaluation is available on GitHub: https://github.com/Aloriosa/srmt.