🤖 AI Summary
In multi-agent reinforcement learning (MARL), sparse episodic rewards impede effective exploration of the joint action space due to state-action coupling and exponential action-space growth. To address this, we propose the Mutual Intrinsic Reward (MIR) mechanism: it quantifies the influence of an individual agent’s action on teammates’ state transitions, yielding a differentiable, fully decentralized intrinsic reward that guides collaborative exploration of high-impact joint actions. MIR is seamlessly integrated into standard MARL frameworks—requiring neither centralized training nor additional inter-agent communication. Empirical evaluation on MiniGrid-MA, a custom sparse-reward multi-agent environment, demonstrates that MIR significantly improves exploration efficiency and task success rates. It outperforms state-of-the-art baselines—including MAPPO and QMix augmented with RND—across multiple benchmark tasks, validating its effectiveness in mitigating state coupling and alleviating the curse of dimensionality in joint action spaces.
📝 Abstract
Episodic rewards present a significant challenge in reinforcement learning. While intrinsic reward methods have demonstrated effectiveness in single-agent rein-forcement learning scenarios, their application to multi-agent reinforcement learn-ing (MARL) remains problematic. The primary difficulties stem from two fac-tors: (1) the exponential sparsity of joint action trajectories that lead to rewards as the exploration space expands, and (2) existing methods often fail to account for joint actions that can influence team states. To address these challenges, this paper introduces Mutual Intrinsic Reward (MIR), a simple yet effective enhancement strategy for MARL with extremely sparse rewards like episodic rewards. MIR incentivizes individual agents to explore actions that affect their teammates, and when combined with original strategies, effectively stimulates team exploration and improves algorithm performance. For comprehensive experimental valida-tion, we extend the representative single-agent MiniGrid environment to create MiniGrid-MA, a series of MARL environments with sparse rewards. Our evalu-ation compares the proposed method against state-of-the-art approaches in the MiniGrid-MA setting, with experimental results demonstrating superior perfor-mance.