Diffusion Offline Reinforcement Learning for Fair and Energy-Efficient UAV-Assisted Wireless Networks

📅 2026-06-15

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of balancing fairness and energy efficiency in UAV-assisted wireless networks under limited offline data. To this end, the authors propose a Diffusion Soft Actor-Critic (Diffusion SAC) framework that, for the first time, integrates denoising diffusion probabilistic models (DDPMs) into offline reinforcement learning, combined with Conservative Q-Learning (CQL) to learn signal-aware trajectory and scheduling policies from static datasets. This approach substantially enhances policy expressiveness and data efficiency, overcoming the generalization limitations of conventional methods in low-data and dynamic environments. Experimental results demonstrate that the proposed method achieves over 35% higher system throughput, significantly reduced energy consumption, and more stable training convergence compared to existing approaches, yielding greater cumulative rewards even with scarce offline data.

📝 Abstract

The integration of generative artificial intelligence with wireless communication and signal processing systems has opened new avenues for intelligent, data-driven decision-making in future 6G networks. This work proposes a diffusion soft actor-critic (Diffusion-SAC) approach that leverages offline reinforcement learning (RL) enhanced by denoising diffusion probabilistic models (DDPMs) to optimize trajectory and scheduling control in unmanned aerial vehicle (UAV) networks. While offline RL methods, such as conservative Q-learning (CQL), can learn from static datasets, they often struggle to generalize in low-data or dynamic conditions. To address this, we combine the robustness of CQL with the generative power of diffusion models, enabling expressive and signal-aware policy learning that generalizes beyond behavior policies. Applied to a UAV-assisted wireless network, the proposed framework minimizes transmission energy and improves fairness among devices. Simulations show that Diffusion-SAC outperforms standard offline RL baselines, achieving more stable convergence and higher rewards even with limited datasets. The method enhances data efficiency, reduces energy consumption, and increases throughput by more than 35 % compared to existing algorithms, demonstrating its potential for robust policy learning in next-generation wireless control systems.

Problem

Research questions and friction points this paper is trying to address.

Offline Reinforcement Learning

UAV-Assisted Wireless Networks

Energy Efficiency

Fairness

Diffusion Models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion Models

Offline Reinforcement Learning

UAV Networks