🤖 AI Summary
This work addresses the issues of excessive advising, training instability, and performance degradation in decentralized multi-agent reinforcement learning caused by neglecting teacher-student compatibility. To this end, the authors propose a consensus-based communication and knowledge-sharing framework that constructs a consensus model from local observations via contrastive learning and integrates an action-scoring mechanism. Designed for seamless plug-in compatibility within the decentralized training with decentralized execution (DTDE) paradigm, the method enables agents to adaptively accept advice during action selection according to consensus constraints, thereby effectively balancing exploration and exploitation. Experimental results demonstrate that the proposed approach significantly improves coordination efficiency, learning speed, and final performance on benchmark environments including Google Research Football and StarCraft II, outperforming existing DTDE baselines.
📝 Abstract
In Decentralized Training and Decentralized Execution (DTDE) for cooperative Multi-Agent Reinforcement Learning (MARL), action-advising-based knowledge sharing promotes interpretable and scalable cooperation among agents. However, current action advising approaches often adhere too much to the teacher's guidance without evaluating teacher-student compatibility, which causes excessive advising, suboptimal stability, and degraded performance. To overcome these challenges, this paper presents a Consensus-based Communication and Knowledge Sharing (CCKS) framework, which allows agents to adopt recommendations based on consensus-derived constraints and to follow the teacher's instructions more smartly. This mechanism enables agents to balance exploration and learning from experienced teachers, improving overall performance. The key is the consensus model construction, for which we propose to employ contrastive learning to construct consensus models based on local observations in the agents' training phase. In action selection, agents score and choose actions based on consensus and shared knowledge. Designed as a plug-and-play solution, CCKS integrates seamlessly with existing DTDE algorithms. Experiments conducted in the Google Research Football environment and the complex StarCraft II Multi-Agent Challenge demonstrate that the integration with CCKS significantly improves cooperation efficiency, learning speed, and overall performance compared with current DTDE baselines. The code is available at https://github.com/yuanxpy/CCKS.