🤖 AI Summary
In sparse-reward multi-agent reinforcement learning (MARL), agents suffer from insufficient exploration and dispersed collaborative attention. To address this, we propose the Focusing Influence Mechanism (FIM), the first MARL framework to incorporate the military concept of “center of gravity” — identifying task-critical state dimensions (i.e., centers of gravity) via stability analysis. FIM introduces a counterfactual intrinsic reward function and a eligibility-trace-enhanced credit assignment mechanism, enabling agents to jointly focus on and stably influence transitions involving these critical dimensions. Crucially, FIM requires no environment-specific prior knowledge and supports end-to-end training. Evaluated on standard sparse-reward benchmarks—including SMAC and MPE—FIM consistently outperforms state-of-the-art MARL algorithms, demonstrating significant improvements in collaborative efficiency and convergence stability under extremely sparse reward settings.
📝 Abstract
Cooperative multi-agent reinforcement learning (MARL) under sparse rewards presents a fundamental challenge due to limited exploration and insufficient coordinated attention among agents. In this work, we propose the Focusing Influence Mechanism (FIM), a novel framework that enhances cooperation by directing agent influence toward task-critical elements, referred to as Center of Gravity (CoG) state dimensions, inspired by Clausewitz's military theory. FIM consists of three core components: (1) identifying CoG state dimensions based on their stability under agent behavior, (2) designing counterfactual intrinsic rewards to promote meaningful influence on these dimensions, and (3) encouraging persistent and synchronized focus through eligibility-trace-based credit accumulation. These mechanisms enable agents to induce more targeted and effective state transitions, facilitating robust cooperation even in extremely sparse reward settings. Empirical evaluations across diverse MARL benchmarks demonstrate that the proposed FIM significantly improves cooperative performance compared to baselines.