🤖 AI Summary
For the Virtual Network Embedding with Adaptive Topology (VNEAP) problem—where virtual networks exhibit dynamic topologies in time-varying environments—this paper proposes HRL-VNEAP, a hierarchical reinforcement learning framework. The high-level policy selects an optimal topology or rejects the request, while the low-level policy performs resource mapping. This work is the first to introduce hierarchical RL to VNEAP, effectively decoupling topology selection from resource allocation to enhance decision efficiency and long-term cumulative reward. The framework integrates state encoding, attention mechanisms, and multi-step reward shaping to support real-time embedding. Extensive experiments on realistic topologies demonstrate that HRL-VNEAP outperforms the strongest baseline by 20.7% in request acceptance ratio, 36.2% in total revenue, and 22.1% in revenue-to-cost ratio, closely approaching the MILP-optimal solution.
📝 Abstract
Virtual Network Embedding (VNE) is a key enabler of network slicing, yet most formulations assume that each Virtual Network Request (VNR) has a fixed topology. Recently, VNE with Alternative topologies (VNEAP) was introduced to capture malleable VNRs, where each request can be instantiated using one of several functionally equivalent topologies that trade resources differently. While this flexibility enlarges the feasible space, it also introduces an additional decision layer, making dynamic embedding more challenging. This paper proposes HRL-VNEAP, a hierarchical reinforcement learning approach for VNEAP under dynamic arrivals. A high-level policy selects the most suitable alternative topology (or rejects the request), and a low-level policy embeds the chosen topology onto the substrate network. Experiments on realistic substrate topologies under multiple traffic loads show that naive exploitation strategies provide only modest gains, whereas HRL-VNEAP consistently achieves the best performance across all metrics. Compared to the strongest tested baselines, HRL-VNEAP improves acceptance ratio by up to extbf{20.7%}, total revenue by up to extbf{36.2%}, and revenue-over-cost by up to extbf{22.1%}. Finally, we benchmark against an MILP formulation on tractable instances to quantify the remaining gap to optimality and motivate future work on learning- and optimization-based VNEAP solutions.