Enhancing the MADDPG Algorithm for Multi-Agent Learning via Action Inference and Importance Sampling

📅 2026-06-03

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

This work addresses the challenges of training instability and inefficient coordination in multi-agent reinforcement learning caused by environmental non-stationarity. To enhance collaborative performance, the authors propose an action inference mechanism that enables each agent to explicitly predict the behaviors of its teammates. Furthermore, they introduce— for the first time in multi-agent settings—importance sampling based on the geometric distribution into experience replay, effectively mitigating non-stationarity. Integrated within the MADDPG framework, the proposed approach demonstrates substantially improved learning stability, exploration efficiency, and team coordination on the Predator-Prey task from the PettingZoo benchmark, consistently outperforming the standard MADDPG algorithm.

📝 Abstract

We investigate multi-agent deep reinforcement learning and propose two enhancements to the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm. First, we introduce a novel Action Inference mechanism that enables each agent to predict other agents' intended actions, thereby improving the accuracy and stability of its own policy. Second, we apply an importance sampling strategy, using geometric distribution, in the replay buffer to prioritize more recent and informative experiences, which helps mitigate the non-stationarity inherent in multi-agent environments. We evaluate both modifications on the discrete-action Predator-Prey task provided by the PettingZoo library, a flexible Python interface for general multi-agent reinforcement learning benchmarks. Our results indicate that Action Inference is effective in improving learning stability and inter-agent cooperation and that importance sampling using geometric distribution can lead to significant improvements in exploration efficiency over standard MADDPG. Code available at https://github.com/shaashwathsivakumar/MARL_Proj

Problem

Research questions and friction points this paper is trying to address.

multi-agent reinforcement learning

MADDPG

non-stationarity

learning stability

exploration efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Action Inference

Importance Sampling

MADDPG