MIR: Efficient Exploration in Episodic Multi-Agent Reinforcement Learning via Mutual Intrinsic Reward

📅 2025-11-21

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

In multi-agent reinforcement learning (MARL), sparse episodic rewards impede effective exploration of the joint action space due to state-action coupling and exponential action-space growth. To address this, we propose the Mutual Intrinsic Reward (MIR) mechanism: it quantifies the influence of an individual agent’s action on teammates’ state transitions, yielding a differentiable, fully decentralized intrinsic reward that guides collaborative exploration of high-impact joint actions. MIR is seamlessly integrated into standard MARL frameworks—requiring neither centralized training nor additional inter-agent communication. Empirical evaluation on MiniGrid-MA, a custom sparse-reward multi-agent environment, demonstrates that MIR significantly improves exploration efficiency and task success rates. It outperforms state-of-the-art baselines—including MAPPO and QMix augmented with RND—across multiple benchmark tasks, validating its effectiveness in mitigating state coupling and alleviating the curse of dimensionality in joint action spaces.

Technology Category

Application Category

📝 Abstract

Episodic rewards present a significant challenge in reinforcement learning. While intrinsic reward methods have demonstrated effectiveness in single-agent rein-forcement learning scenarios, their application to multi-agent reinforcement learn-ing (MARL) remains problematic. The primary difficulties stem from two fac-tors: (1) the exponential sparsity of joint action trajectories that lead to rewards as the exploration space expands, and (2) existing methods often fail to account for joint actions that can influence team states. To address these challenges, this paper introduces Mutual Intrinsic Reward (MIR), a simple yet effective enhancement strategy for MARL with extremely sparse rewards like episodic rewards. MIR incentivizes individual agents to explore actions that affect their teammates, and when combined with original strategies, effectively stimulates team exploration and improves algorithm performance. For comprehensive experimental valida-tion, we extend the representative single-agent MiniGrid environment to create MiniGrid-MA, a series of MARL environments with sparse rewards. Our evalu-ation compares the proposed method against state-of-the-art approaches in the MiniGrid-MA setting, with experimental results demonstrating superior perfor-mance.

Problem

Research questions and friction points this paper is trying to address.

Addresses sparse episodic rewards in multi-agent reinforcement learning

Solves exponential sparsity of joint action trajectories in MARL

Improves team exploration through mutual intrinsic reward incentives

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mutual Intrinsic Reward enhances multi-agent exploration

Incentivizes agents to explore teammate-affecting actions

Combines with original strategies for team exploration

🔎 Similar Papers

No similar papers found.

Authors to Follow