🤖 AI Summary
This work addresses the limitations of conventional millimeter-wave and terahertz systems that rely on predefined beam codebooks, which suffer from degraded performance and poor robustness under non-ideal conditions such as non-line-of-sight propagation, hardware impairments, and feedback noise. To overcome these challenges, the paper proposes a multi-agent reinforcement learning framework that learns robust beam codebooks directly from environmental feedback without requiring prior channel state information. It presents the first systematic evaluation of stochastic policies for this task, implementing and comparing three off-policy algorithms—DDPG, TD3, and SAC—and demonstrates that the stochastic-policy-based SAC significantly enhances stability and adaptability. Simulations show that SAC maintains high beamforming gain and training robustness even in harsh scenarios involving strong hardware impairments, non-line-of-sight conditions, and high feedback noise, thereby surpassing the limitations of traditional deterministic approaches.
📝 Abstract
Millimeter-wave (mmWave) and terahertz (THz) massive MIMO systems often rely on predefined beamforming codebooks, which are usually suboptimal in Non-Line-of-Sight (NLoS) conditions and for hardware-limited transceivers. Reinforcement Learning (RL) enables adaptive, data-driven codebook design without explicit Channel State Information (CSI), but the robustness of such algorithms in practical conditions is underexplored. This paper introduces a robust multi-agent RL framework that learns beam codebooks directly from environmental feedback, eliminating the need for prior channel knowledge. Our method is well-suited for real-world deployments facing unpredictable propagation and hardware constraints. We conduct a comprehensive analysis of three off-policy algorithms, Deep Deterministic Policy Gradient (DDPG), Twin Delayed DDPG (TD3), and Soft Actor-Critic (SAC), evaluating their resilience to hardware impairments and feedback noise. Simulations show that SAC consistently outperforms deterministic methods, achieving superior beamforming gains and stability in NLoS scenarios, even under severe impairments. These results demonstrate the promise of RL-based codebook design for robust mmWave/THz massive MIMO systems.