Semi-Gradient SARSA Routing with Theoretical Guarantee on Traffic Stability and Weight Convergence

📅 2025-03-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses traffic control in dynamic routing for parallel server systems, focusing on online approximation of the value function and stability guarantees under an unbounded state space. We propose a semi-gradient SARSA algorithm that employs tunable universal basis functions to approximate the value function. For the first time, under mild assumptions—including Lipschitz continuity without gradient requirements, bounded temporal-difference (TD) errors, and prior ergodicity—we rigorously establish both system-state stability and almost-sure convergence of the weight vector to a near-optimal solution, by integrating Lyapunov stability analysis with stochastic approximation and ordinary differential equation (ODE) methods. Under these theoretical guarantees, simulations demonstrate significantly faster convergence compared to neural-network-based basis function approaches, with negligible approximation error.

Technology Category

Application Category

📝 Abstract
We consider the traffic control problem of dynamic routing over parallel servers, which arises in a variety of engineering systems such as transportation and data transmission. We propose a semi-gradient, on-policy algorithm that learns an approximate optimal routing policy. The algorithm uses generic basis functions with flexible weights to approximate the value function across the unbounded state space. Consequently, the training process lacks Lipschitz continuity of the gradient, boundedness of the temporal-difference error, and a prior guarantee on ergodicity, which are the standard prerequisites in existing literature on reinforcement learning theory. To address this, we combine a Lyapunov approach and an ordinary differential equation-based method to jointly characterize the behavior of traffic state and approximation weights. Our theoretical analysis proves that the training scheme guarantees traffic state stability and ensures almost surely convergence of the weights to the approximate optimum. We also demonstrate via simulations that our algorithm attains significantly faster convergence than neural network-based methods with an insignificant approximation error.
Problem

Research questions and friction points this paper is trying to address.

Dynamic routing control over parallel servers
Ensuring traffic stability and weight convergence
Faster convergence with minimal approximation error
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semi-gradient SARSA algorithm for dynamic routing
Lyapunov and ODE methods for stability analysis
Flexible weight approximation for unbounded state space
🔎 Similar Papers
No similar papers found.
Y
Yidan Wu
UM Joint Institute, Shanghai Jiao Tong University, Shanghai 200240, China
Y
Yu Yu
School of Computer Science, Wuhan University, Hubei 430072, China
Jianan Zhang
Jianan Zhang
Assistant Professor, Peking University
communication networksoptimizationnetworked intelligence
L
Li Jin
UM Joint Institute and the Department of Automation, Shanghai Jiao Tong University, Shanghai 200240, China