Target Updates May Stabilize Linear Q-Learning: Periodic and Soft Dynamics

📅 2026-05-31
📈 Citations: 0
Influential: 0
📄 PDF

career value

247K/year
🤖 AI Summary
This work addresses the convergence instability of linear Q-learning under general conditions, where existing theory lacks rigorous analysis of the stabilizing mechanisms in periodic hard and soft target updates. By modeling the dynamics induced by the Bellman optimality operator as a switched linear system, the paper establishes, for the first time, a convergence criterion for target-update schemes based on the joint spectral radius. It rigorously proves that both hard and soft target updates converge to the exact projected Q-Bellman solution under explicit spectral conditions and stepsize constraints. The framework is further extended to reinforcement learning settings with stochastic noise, providing a unified theoretical foundation for the stability of Q-learning with linear function approximation.
📝 Abstract
Periodic target updates in Q-learning and soft target updates in actor-critic methods are empirically well established stabilization mechanisms, but their precise theoretical explanation is still incomplete. This paper gives a rigorous and exact analysis of these mechanisms for Q-learning with linear function approximation (linear Q-learning) using the exact switched linear system (SLS) dynamics induced by the Bellman maximum and the joint spectral radius (JSR) of the resulting switching matrix families. Although linear Q-learning can fail to converge in general, we prove that, under explicit spectral and step-size conditions, periodic hard target updates and soft target updates can guarantee convergence to the exact projected Q-Bellman solution. The main analysis is carried out for deterministic linear Q-learning, where the target-update mechanism is most transparent. Once the corresponding JSR certificate is established for the mean recursion, the stochastic reinforcement-learning setting can be treated by replacing deterministic modes with sampled stochastic modes and adding the corresponding stochastic-noise analysis.
Problem

Research questions and friction points this paper is trying to address.

linear Q-learning
target updates
convergence
switched linear system
projected Q-Bellman solution
Innovation

Methods, ideas, or system contributions that make the work stand out.

switched linear system
joint spectral radius
linear Q-learning
target update
convergence analysis
🔎 Similar Papers
No similar papers found.