🤖 AI Summary
In cooperative multi-agent reinforcement learning (MARL), monolithic global coalition formation leads to inaccurate credit assignment, poor task decomposition capability, and suboptimal performance. To address this, this paper introduces the *nucleolus*—a solution concept from cooperative game theory—into MARL credit assignment for the first time, proposing the Nucleolus Q-Learning framework. Our method automatically identifies stable and efficient small-scale subcoalitions via the nucleolus solution, enabling interpretable subtask decomposition while providing theoretical guarantees on convergence and stability. Evaluated on Predator-Prey and StarCraft II benchmarks across multiple difficulty levels, our approach achieves significant improvements in win rate and cumulative reward—particularly outperforming four state-of-the-art baselines in hard and super-hard scenarios. These results empirically validate the effectiveness and generalizability of multi-subcoalition structures for modeling complex cooperative tasks.
📝 Abstract
In cooperative multi-agent reinforcement learning (MARL), agents typically form a single grand coalition based on credit assignment to tackle a composite task, often resulting in suboptimal performance. This paper proposed a nucleolus-based credit assignment grounded in cooperative game theory, enabling the autonomous partitioning of agents into multiple small coalitions that can effectively identify and complete subtasks within a larger composite task. Specifically, our designed nucleolus Q-learning could assign fair credits to each agent, and the nucleolus Q-operator provides theoretical guarantees with interpretability for both learning convergence and the stability of the formed small coalitions. Through experiments on Predator-Prey and StarCraft scenarios across varying difficulty levels, our approach demonstrated the emergence of multiple effective coalitions during MARL training, leading to faster learning and superior performance in terms of win rate and cumulative rewards especially in hard and super-hard environments, compared to four baseline methods. Our nucleolus-based credit assignment showed the promise for complex composite tasks requiring effective subteams of agents.