Target Updates May Stabilize Linear Q-Learning: Periodic and Soft Dynamics

📅 2026-05-31

📈 Citations: 0

✨ Influential: 0

career value

247K/year

🤖 AI Summary

This work addresses the convergence instability of linear Q-learning under general conditions, where existing theory lacks rigorous analysis of the stabilizing mechanisms in periodic hard and soft target updates. By modeling the dynamics induced by the Bellman optimality operator as a switched linear system, the paper establishes, for the first time, a convergence criterion for target-update schemes based on the joint spectral radius. It rigorously proves that both hard and soft target updates converge to the exact projected Q-Bellman solution under explicit spectral conditions and stepsize constraints. The framework is further extended to reinforcement learning settings with stochastic noise, providing a unified theoretical foundation for the stability of Q-learning with linear function approximation.

📝 Abstract

Periodic target updates in Q-learning and soft target updates in actor-critic methods are empirically well established stabilization mechanisms, but their precise theoretical explanation is still incomplete. This paper gives a rigorous and exact analysis of these mechanisms for Q-learning with linear function approximation (linear Q-learning) using the exact switched linear system (SLS) dynamics induced by the Bellman maximum and the joint spectral radius (JSR) of the resulting switching matrix families. Although linear Q-learning can fail to converge in general, we prove that, under explicit spectral and step-size conditions, periodic hard target updates and soft target updates can guarantee convergence to the exact projected Q-Bellman solution. The main analysis is carried out for deterministic linear Q-learning, where the target-update mechanism is most transparent. Once the corresponding JSR certificate is established for the mean recursion, the stochastic reinforcement-learning setting can be treated by replacing deterministic modes with sampled stochastic modes and adding the corresponding stochastic-noise analysis.

Problem

Research questions and friction points this paper is trying to address.

linear Q-learning

target updates

convergence

switched linear system

projected Q-Bellman solution

Innovation

Methods, ideas, or system contributions that make the work stand out.

switched linear system

joint spectral radius

linear Q-learning