Enhancing Cooperative Multi-Agent Reinforcement Learning with State Modelling and Adversarial Exploration

📅 2025-05-08

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This paper addresses the challenging problem of decentralized partially observable multi-agent cooperation without inter-agent communication. To this end, we propose SMPE—the first multi-agent reinforcement learning (MARL) framework that jointly models latent global state beliefs and employs adversarial exploration. Its core contributions are threefold: (1) explicit construction of individual belief representations to infer the global latent state; (2) integration of belief representations into policy networks to enable belief-driven policy optimization; and (3) a novel adversarial exploration mechanism that enhances coordinated exploration efficiency and implicit collaboration. Evaluated on standard benchmarks—including Multi-Agent Particle Environments (MPE), Level-Based Foraging (LBF), and Robot Warehousing (RWARE)—SMPE consistently outperforms existing state-of-the-art methods. Empirical results demonstrate that the synergistic combination of joint belief modeling and adversarial exploration fundamentally improves cooperative performance in partially observable, communication-free settings.

Technology Category

Application Category

📝 Abstract

Learning to cooperate in distributed partially observable environments with no communication abilities poses significant challenges for multi-agent deep reinforcement learning (MARL). This paper addresses key concerns in this domain, focusing on inferring state representations from individual agent observations and leveraging these representations to enhance agents' exploration and collaborative task execution policies. To this end, we propose a novel state modelling framework for cooperative MARL, where agents infer meaningful belief representations of the non-observable state, with respect to optimizing their own policies, while filtering redundant and less informative joint state information. Building upon this framework, we propose the MARL SMPE algorithm. In SMPE, agents enhance their own policy's discriminative abilities under partial observability, explicitly by incorporating their beliefs into the policy network, and implicitly by adopting an adversarial type of exploration policies which encourages agents to discover novel, high-value states while improving the discriminative abilities of others. Experimentally, we show that SMPE outperforms state-of-the-art MARL algorithms in complex fully cooperative tasks from the MPE, LBF, and RWARE benchmarks.

Problem

Research questions and friction points this paper is trying to address.

Infer state representations from partial observations in MARL

Enhance cooperative exploration with adversarial policies

Improve discriminative abilities in partially observable environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

State modelling framework for inferring belief representations

Adversarial exploration to discover high-value states

MARL SMPE algorithm enhancing policy discriminative abilities

🔎 Similar Papers

No similar papers found.

Authors to Follow