Enhancing Cooperative Multi-Agent Reinforcement Learning with State Modelling and Adversarial Exploration

📅 2025-05-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenging problem of decentralized partially observable multi-agent cooperation without inter-agent communication. To this end, we propose SMPE—the first multi-agent reinforcement learning (MARL) framework that jointly models latent global state beliefs and employs adversarial exploration. Its core contributions are threefold: (1) explicit construction of individual belief representations to infer the global latent state; (2) integration of belief representations into policy networks to enable belief-driven policy optimization; and (3) a novel adversarial exploration mechanism that enhances coordinated exploration efficiency and implicit collaboration. Evaluated on standard benchmarks—including Multi-Agent Particle Environments (MPE), Level-Based Foraging (LBF), and Robot Warehousing (RWARE)—SMPE consistently outperforms existing state-of-the-art methods. Empirical results demonstrate that the synergistic combination of joint belief modeling and adversarial exploration fundamentally improves cooperative performance in partially observable, communication-free settings.

Technology Category

Application Category

📝 Abstract
Learning to cooperate in distributed partially observable environments with no communication abilities poses significant challenges for multi-agent deep reinforcement learning (MARL). This paper addresses key concerns in this domain, focusing on inferring state representations from individual agent observations and leveraging these representations to enhance agents' exploration and collaborative task execution policies. To this end, we propose a novel state modelling framework for cooperative MARL, where agents infer meaningful belief representations of the non-observable state, with respect to optimizing their own policies, while filtering redundant and less informative joint state information. Building upon this framework, we propose the MARL SMPE algorithm. In SMPE, agents enhance their own policy's discriminative abilities under partial observability, explicitly by incorporating their beliefs into the policy network, and implicitly by adopting an adversarial type of exploration policies which encourages agents to discover novel, high-value states while improving the discriminative abilities of others. Experimentally, we show that SMPE outperforms state-of-the-art MARL algorithms in complex fully cooperative tasks from the MPE, LBF, and RWARE benchmarks.
Problem

Research questions and friction points this paper is trying to address.

Infer state representations from partial observations in MARL
Enhance cooperative exploration with adversarial policies
Improve discriminative abilities in partially observable environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

State modelling framework for inferring belief representations
Adversarial exploration to discover high-value states
MARL SMPE algorithm enhancing policy discriminative abilities
🔎 Similar Papers
No similar papers found.
A
A. Kontogiannis
School of Electrical and Computer Engineering, NTUA, Greece; Archimedes, Athena Research Center, Greece
K
Konstantinos Papathanasiou
ETH Zurich
Y
Yi Shen
Dept. of Mechanical Engineering & Materials Science, Duke University
G
G. Stamou
M
M. Zavlanos
Dept. of Mechanical Engineering & Materials Science, Duke University
George Vouros
George Vouros
Professor of AI, University of Piraeus
artificial intelligencemultiagent systemsknowledge representation and reasoningontologies