Multi-Agent Conditional Diffusion Model with Mean Field Communication as Wireless Resource Allocation Planner

📅 2025-10-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address QoS enhancement in wireless resource allocation, existing centralized multi-agent reinforcement learning (MARL) suffers from scalability limitations and privacy risks, while decentralized training with decentralized execution (DTDE) faces non-stationarity and insufficient cooperation. This paper proposes a decentralized multi-agent diffusion model framework. First, it integrates model-based RL with mean-field communication, leveraging mean-field statistics to approximate large-scale agent interactions—thereby mitigating non-stationarity and improving coordination. Second, it theoretically derives an upper bound on the approximation error of the generative process distribution, ensuring convergence stability. Third, it employs diffusion models to capture environment dynamics and an inverse dynamics model to guide action generation, enabling efficient trajectory planning. Experiments demonstrate that our approach significantly outperforms state-of-the-art MARL baselines in both average reward and QoS metrics, while exhibiting strong scalability and practical deployment potential.

Technology Category

Application Category

📝 Abstract
In wireless communication systems, efficient and adaptive resource allocation plays a crucial role in enhancing overall Quality of Service (QoS). While centralized Multi-Agent Reinforcement Learning (MARL) frameworks rely on a central coordinator for policy training and resource scheduling, they suffer from scalability issues and privacy risks. In contrast, the Distributed Training with Decentralized Execution (DTDE) paradigm enables distributed learning and decision-making, but it struggles with non-stationarity and limited inter-agent cooperation, which can severely degrade system performance. To overcome these challenges, we propose the Multi-Agent Conditional Diffusion Model Planner (MA-CDMP) for decentralized communication resource management. Built upon the Model-Based Reinforcement Learning (MBRL) paradigm, MA-CDMP employs Diffusion Models (DMs) to capture environment dynamics and plan future trajectories, while an inverse dynamics model guides action generation, thereby alleviating the sample inefficiency and slow convergence of conventional DTDE methods. Moreover, to approximate large-scale agent interactions, a Mean-Field (MF) mechanism is introduced as an assistance to the classifier in DMs. This design mitigates inter-agent non-stationarity and enhances cooperation with minimal communication overhead in distributed settings. We further theoretically establish an upper bound on the distributional approximation error introduced by the MF-based diffusion generation, guaranteeing convergence stability and reliable modeling of multi-agent stochastic dynamics. Extensive experiments demonstrate that MA-CDMP consistently outperforms existing MARL baselines in terms of average reward and QoS metrics, showcasing its scalability and practicality for real-world wireless network optimization.
Problem

Research questions and friction points this paper is trying to address.

Addresses scalability and privacy issues in centralized wireless resource allocation
Solves non-stationarity and limited cooperation in distributed multi-agent systems
Improves sample efficiency and convergence for decentralized communication management
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent conditional diffusion model for decentralized resource allocation
Mean-field mechanism approximates interactions with minimal communication
Inverse dynamics model guides action generation to improve efficiency
🔎 Similar Papers
No similar papers found.
K
Kechen Meng
College of Information Science and Electronic Engineering, Zhejiang University
S
Sinuo Zhang
College of Information Science and Electronic Engineering, Zhejiang University
Rongpeng Li
Rongpeng Li
Zhejiang University
Multi-Agent CommunicationsNetGPTMARLNetwork SlicingAI for Fusion
Xiangming Meng
Xiangming Meng
The Zhejiang University-University of Illinois Urbana-Champaign Institute, Zhejiang University
machine learningsignal processingBayesian inference
C
Chan Wang
College of Information Science and Electronic Engineering, Zhejiang University
M
Ming Lei
College of Information Science and Electronic Engineering, Zhejiang University
Z
Zhifeng Zhao
Zhejiang Lab as well as the College of Information Science and Electronic Engineering, Zhejiang University