KD-MARL: Resource-Aware Knowledge Distillation in Multi-Agent Reinforcement Learning

📅 2026-04-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of deploying high-performance multi-agent reinforcement learning systems on edge devices, where computational resources, memory, and inference latency constraints preclude the direct use of computationally expensive centralized expert policies. To overcome this limitation, the authors propose a two-stage knowledge distillation framework that transfers both action behaviors and coordination structures from the expert to lightweight decentralized student agents—without relying on critic networks. By leveraging advantage-based signal distillation and structured policy supervision, the method effectively captures the expert’s collaborative capabilities while accommodating heterogeneous agent architectures tailored to varying observation complexities. Evaluated on SMAC and MPE benchmarks, the approach retains over 90% of the expert’s performance while reducing FLOPs by up to 28.6×, substantially enhancing deployment efficiency in resource-constrained environments.
📝 Abstract
Real world deployment of multi agent reinforcement learning MARL systems is fundamentally constrained by limited compute memory and inference time. While expert policies achieve high performance they rely on costly decision cycles and large scale models that are impractical for edge devices or embedded platforms. Knowledge distillation KD offers a promising path toward resource aware execution but existing KD methods in MARL focus narrowly on action imitation often neglecting coordination structure and assuming uniform agent capabilities. We propose resource aware Knowledge Distillation for Multi Agent Reinforcement Learning KD MARL a two stage framework that transfers coordinated behavior from a centralized expert to lightweight decentralized student agents. The student policies are trained without a critic relying instead on distilled advantage signals and structured policy supervision to preserve coordination under heterogeneous and limited observations. Our approach transfers both action level behavior and structural coordination patterns from expert policies while supporting heterogeneous student architectures allowing each agent model capacity to match its observation complexity which is crucial for efficient execution under partial or limited observability and limited onboard resources. Extensive experiments on SMAC and MPE benchmarks demonstrate that KD MARL achieves high performance retention while substantially reducing computational cost. Across standard multi agent benchmarks KD MARL retains over 90 percent of expert performance while reducing computational cost by up to 28.6 times FLOPs. The proposed approach achieves expert level coordination and preserves it through structured distillation enabling practical MARL deployment across resource constrained onboard platforms.
Problem

Research questions and friction points this paper is trying to address.

multi-agent reinforcement learning
knowledge distillation
resource-constrained deployment
heterogeneous agents
coordination structure
Innovation

Methods, ideas, or system contributions that make the work stand out.

Knowledge Distillation
Multi-Agent Reinforcement Learning
Resource-Aware
Coordination Preservation
Heterogeneous Agents
🔎 Similar Papers
No similar papers found.
M
Monirul Islam Pavel
School of Computer Science and Information Technology, Adelaide University, Australia
Siyi Hu
Siyi Hu
Adelaide University
Generative AIReinforcement LearningMulti-Agent Systems
M
Muhammad Anwar Masum
School of Computer Science and Information Technology, Adelaide University, Australia
Mahardhika Pratama
Mahardhika Pratama
Associate Professor-Level Enterprise Fellow, STEM, University of South Australia
continual learningonline learningstream learningfew-shot learningdomain adaptation
Ryszard Kowalczyk
Ryszard Kowalczyk
SmartSat CRC Professorial Chair in Artificial Intelligence
Z
Zehong Jimmy Cao
School of Computer Science and Information Technology, Adelaide University, Australia