Online Statistical Inference of Constant Sample-averaged Q-Learning

📅 2026-03-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the instability of Q-value estimation in reinforcement learning under noisy environments or sparse rewards, where high variance impedes reliable statistical inference. The authors propose a constant-sample average Q-learning framework that enables online statistical inference by introducing, for the first time in this domain, the functional central limit theorem combined with stochastic scaling to construct asymptotically valid confidence intervals for Q-values. The theoretical analysis provides rigorous statistical guarantees, and empirical evaluations on grid-world and dynamic resource allocation tasks demonstrate that the proposed method achieves more accurate coverage probabilities and better-calibrated confidence interval widths compared to standard Q-learning.
📝 Abstract
Reinforcement learning algorithms have been widely used for decision-making tasks in various domains. However, the performance of these algorithms can be impacted by high variance and instability, particularly in environments with noise or sparse rewards. In this paper, we propose a framework to perform statistical online inference for a sample-averaged Q-learning approach. We adapt the functional central limit theorem (FCLT) for the modified algorithm under some general conditions and then construct confidence intervals for the Q-values via random scaling. We conduct experiments to perform inference on both the modified approach and its traditional counterpart, Q-learning using random scaling and report their coverage rates and confidence interval widths on two problems: a grid world problem as a simple toy example and a dynamic resource-matching problem as a real-world example for comparison between the two solution approaches.
Problem

Research questions and friction points this paper is trying to address.

online statistical inference
Q-learning
sample-averaged
confidence intervals
reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

online statistical inference
sample-averaged Q-learning
functional central limit theorem
random scaling
confidence intervals
🔎 Similar Papers
No similar papers found.
S
Saunak Kumar Panda
Department of Industrial Engineering, University of Houston
T
Tong Li
Department of Industrial Engineering, University of Houston
Ruiqi Liu
Ruiqi Liu
Texas Tech University
nonparametric methodsmachine learningeconometrics
Yisha Xiang
Yisha Xiang
University of Houston