MIR: Efficient Exploration in Episodic Multi-Agent Reinforcement Learning via Mutual Intrinsic Reward

📅 2025-11-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In multi-agent reinforcement learning (MARL), sparse episodic rewards impede effective exploration of the joint action space due to state-action coupling and exponential action-space growth. To address this, we propose the Mutual Intrinsic Reward (MIR) mechanism: it quantifies the influence of an individual agent’s action on teammates’ state transitions, yielding a differentiable, fully decentralized intrinsic reward that guides collaborative exploration of high-impact joint actions. MIR is seamlessly integrated into standard MARL frameworks—requiring neither centralized training nor additional inter-agent communication. Empirical evaluation on MiniGrid-MA, a custom sparse-reward multi-agent environment, demonstrates that MIR significantly improves exploration efficiency and task success rates. It outperforms state-of-the-art baselines—including MAPPO and QMix augmented with RND—across multiple benchmark tasks, validating its effectiveness in mitigating state coupling and alleviating the curse of dimensionality in joint action spaces.

Technology Category

Application Category

📝 Abstract
Episodic rewards present a significant challenge in reinforcement learning. While intrinsic reward methods have demonstrated effectiveness in single-agent rein-forcement learning scenarios, their application to multi-agent reinforcement learn-ing (MARL) remains problematic. The primary difficulties stem from two fac-tors: (1) the exponential sparsity of joint action trajectories that lead to rewards as the exploration space expands, and (2) existing methods often fail to account for joint actions that can influence team states. To address these challenges, this paper introduces Mutual Intrinsic Reward (MIR), a simple yet effective enhancement strategy for MARL with extremely sparse rewards like episodic rewards. MIR incentivizes individual agents to explore actions that affect their teammates, and when combined with original strategies, effectively stimulates team exploration and improves algorithm performance. For comprehensive experimental valida-tion, we extend the representative single-agent MiniGrid environment to create MiniGrid-MA, a series of MARL environments with sparse rewards. Our evalu-ation compares the proposed method against state-of-the-art approaches in the MiniGrid-MA setting, with experimental results demonstrating superior perfor-mance.
Problem

Research questions and friction points this paper is trying to address.

Addresses sparse episodic rewards in multi-agent reinforcement learning
Solves exponential sparsity of joint action trajectories in MARL
Improves team exploration through mutual intrinsic reward incentives
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mutual Intrinsic Reward enhances multi-agent exploration
Incentivizes agents to explore teammate-affecting actions
Combines with original strategies for team exploration
🔎 Similar Papers
No similar papers found.
K
Kesheng Chen
Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies, School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China
Wenjian Luo
Wenjian Luo
Professor, School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen
AI and SecurityIntelligent SecuritySecure IntelligencePrivacy ComputationImmune Computation
B
Bang Zhang
Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies, School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China
Z
Zeping Yin
Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies, School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China
Zipeng Ye
Zipeng Ye
Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies, School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China