🤖 AI Summary
In multi-task multi-agent reinforcement learning (MT-MARL), agents suffer from task-irrelevant interference and limited cross-task knowledge transfer. To address this, we propose the first skill-graph-driven hierarchical framework for MT-MARL. Our approach models a high-level skill graph as a task-agnostic abstraction of policies, which guides low-level collaborative execution via standard MARL algorithms (e.g., MAPPO). Crucially, the high- and low-level modules are trained decoupledly, enabling both cross-task knowledge reuse and end-to-end joint optimization. The key contribution is the first integration of skill graphs into MT-MARL, facilitating effective transfer across unrelated tasks, strong generalization, and scalability. Extensive experiments on complex multi-task benchmarks demonstrate that our method significantly outperforms hierarchical MAPPO baselines: it achieves 12–28% higher cumulative rewards and accelerates convergence by 35–50%.
📝 Abstract
Multi-task multi-agent reinforcement learning (MT-MARL) has recently gained attention for its potential to enhance MARL's adaptability across multiple tasks. However, it is challenging for existing multi-task learning methods to handle complex problems, as they are unable to handle unrelated tasks and possess limited knowledge transfer capabilities. In this paper, we propose a hierarchical approach that efficiently addresses these challenges. The high-level module utilizes a skill graph, while the low-level module employs a standard MARL algorithm. Our approach offers two contributions. First, we consider the MT-MARL problem in the context of unrelated tasks, expanding the scope of MTRL. Second, the skill graph is used as the upper layer of the standard hierarchical approach, with training independent of the lower layer, effectively handling unrelated tasks and enhancing knowledge transfer capabilities. Extensive experiments are conducted to validate these advantages and demonstrate that the proposed method outperforms the latest hierarchical MAPPO algorithms. Videos and code are available at https://github.com/WindyLab/MT-MARL-SG