Handoff Design in User-Centric Cell-Free Massive MIMO Networks Using DRL

📅 2025-07-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address high signaling overhead and poor stability in ultra-dense cell-free massive MIMO (UC-mMIMO) networks caused by frequent handovers due to user mobility, this paper proposes a deep reinforcement learning-based handover management scheme leveraging the Soft Actor-Critic (SAC) algorithm with continuous action space. The method introduces a novel reward function incorporating handover penalties to encourage clustering of handover decisions into specific time slots, thereby significantly reducing control signaling and resource reconfiguration overhead. Furthermore, it augments the agent’s observation space with dual inputs: user movement direction and historical large-scale fading statistics—enhancing decision robustness under dynamic channel conditions. Experimental results demonstrate that the proposed approach maintains high communication throughput while reducing handover frequency by over 40%, achieving sub-0.4 ms decision latency. Compared to discrete-action alternatives, it exhibits superior scalability, markedly improving network stability and spectral/resource utilization efficiency.

Technology Category

Application Category

📝 Abstract
In the user-centric cell-free massive MIMO (UC-mMIMO) network scheme, user mobility necessitates updating the set of serving access points to maintain the user-centric clustering. Such updates are typically performed through handoff (HO) operations; however, frequent HOs lead to overheads associated with the allocation and release of resources. This paper presents a deep reinforcement learning (DRL)-based solution to predict and manage these connections for mobile users. Our solution employs the Soft Actor-Critic algorithm, with continuous action space representation, to train a deep neural network to serve as the HO policy. We present a novel proposition for a reward function that integrates a HO penalty in order to balance the attainable rate and the associated overhead related to HOs. We develop two variants of our system; the first one uses mobility direction-assisted (DA) observations that are based on the user movement pattern, while the second one uses history-assisted (HA) observations that are based on the history of the large-scale fading (LSF). Simulation results show that our DRL-based continuous action space approach is more scalable than discrete space counterpart, and that our derived HO policy automatically learns to gather HOs in specific time slots to minimize the overhead of initiating HOs. Our solution can also operate in real time with a response time less than 0.4 ms.
Problem

Research questions and friction points this paper is trying to address.

Optimize handoff operations in UC-mMIMO networks using DRL
Balance user data rate and handoff overhead via reward function
Enable real-time handoff decisions with sub-millisecond response
Innovation

Methods, ideas, or system contributions that make the work stand out.

DRL-based HO management for UC-mMIMO networks
Soft Actor-Critic with continuous action space
Mobility and history-assisted HO policy optimization
🔎 Similar Papers
No similar papers found.
H
Hussein A. Ammar
Department of Electrical and Computer Engineering (ECE), Royal Military College of Canada
R
Raviraj Adve
Department of ECE, University of Toronto
Shahram Shahbazpanahi
Shahram Shahbazpanahi
Department of Electrical, Computer, and Software Engineering, University of Ontario Institute of Technology; Status-Only position with the Department of ECE, University of Toronto
G
Gary Boudreau
Ericsson Canada
Israfil Bahceci
Israfil Bahceci
Research Scientist, Utah State University
communicationssignal processing