Robust Beam Codebooks for mmWave/THz Systems: Toward a Stochastic RL Approach

📅 2026-03-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of conventional millimeter-wave and terahertz systems that rely on predefined beam codebooks, which suffer from degraded performance and poor robustness under non-ideal conditions such as non-line-of-sight propagation, hardware impairments, and feedback noise. To overcome these challenges, the paper proposes a multi-agent reinforcement learning framework that learns robust beam codebooks directly from environmental feedback without requiring prior channel state information. It presents the first systematic evaluation of stochastic policies for this task, implementing and comparing three off-policy algorithms—DDPG, TD3, and SAC—and demonstrates that the stochastic-policy-based SAC significantly enhances stability and adaptability. Simulations show that SAC maintains high beamforming gain and training robustness even in harsh scenarios involving strong hardware impairments, non-line-of-sight conditions, and high feedback noise, thereby surpassing the limitations of traditional deterministic approaches.

Technology Category

Application Category

📝 Abstract
Millimeter-wave (mmWave) and terahertz (THz) massive MIMO systems often rely on predefined beamforming codebooks, which are usually suboptimal in Non-Line-of-Sight (NLoS) conditions and for hardware-limited transceivers. Reinforcement Learning (RL) enables adaptive, data-driven codebook design without explicit Channel State Information (CSI), but the robustness of such algorithms in practical conditions is underexplored. This paper introduces a robust multi-agent RL framework that learns beam codebooks directly from environmental feedback, eliminating the need for prior channel knowledge. Our method is well-suited for real-world deployments facing unpredictable propagation and hardware constraints. We conduct a comprehensive analysis of three off-policy algorithms, Deep Deterministic Policy Gradient (DDPG), Twin Delayed DDPG (TD3), and Soft Actor-Critic (SAC), evaluating their resilience to hardware impairments and feedback noise. Simulations show that SAC consistently outperforms deterministic methods, achieving superior beamforming gains and stability in NLoS scenarios, even under severe impairments. These results demonstrate the promise of RL-based codebook design for robust mmWave/THz massive MIMO systems.
Problem

Research questions and friction points this paper is trying to address.

mmWave/THz systems
beam codebooks
Non-Line-of-Sight
hardware impairments
robustness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement Learning
Beam Codebook Design
mmWave/THz Massive MIMO
Hardware Impairments
Non-Line-of-Sight (NLoS)
🔎 Similar Papers
No similar papers found.
A
Anouar Nechi
Institute of Computer Engineering, University of Lübeck, Germany
R
Rainer Buchty
Institute of Computer Engineering, University of Lübeck, Germany
Mladen Berekovic
Mladen Berekovic
Computer Engineering, Universität zu Luebeck, Germany
Computer ArchitectureDSPEmbedded SystemsLow-Power
S
Saleh Mulhem
Institute of Computer Engineering, University of Lübeck, Germany