Missing Data Multiple Imputation for Tabular Q-Learning in Online RL

πŸ“… 2025-10-12
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
In online reinforcement learning (RL), missing data must be imputed in real time at each timestep to support immediate decision-makingβ€”yet conventional offline imputation methods fail to meet the stringent requirements of latency and policy feedback dependence. To address this, we propose the first online multiple imputation framework for RL: it maintains multiple parallel imputation trajectories to model uncertainty dynamically and jointly optimizes imputation, policy updates, and action selection. Our approach integrates a lightweight ensemble imputation mechanism with tabular Q-learning, enabling real-time learning under diverse dynamic missingness patterns in Grid World environments. Experiments demonstrate that the framework significantly improves policy convergence stability and robustness against missing data, particularly under non-stationary and correlated missingness mechanisms. This work establishes a novel paradigm for efficient, interpretable online RL with missing observations, bridging statistical imputation theory and sequential decision-making under uncertainty.

Technology Category

Application Category

πŸ“ Abstract
Missing data in online reinforcement learning (RL) poses challenges compared to missing data in standard tabular data or in offline policy learning. The need to impute and act at each time step means that imputation cannot be put off until enough data exist to produce stable imputation models. It also means future data collection and learning depend on previous imputations. This paper proposes fully online imputation ensembles. We find that maintaining multiple imputation pathways may help balance the need to capture uncertainty under missingness and the need for efficiency in online settings. We consider multiple approaches for incorporating these pathways into learning and action selection. Using a Grid World experiment with various types of missingness, we provide preliminary evidence that multiple imputation pathways may be a useful framework for constructing simple and efficient online missing data RL methods.
Problem

Research questions and friction points this paper is trying to address.

Online RL faces missing data challenges during real-time learning
Imputation must balance uncertainty capture with computational efficiency
Multiple imputation pathways address missingness in tabular Q-learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Online imputation ensembles for tabular Q-learning
Multiple imputation pathways capture uncertainty efficiently
Grid World tests validate online missing data RL methods
πŸ”Ž Similar Papers
No similar papers found.
K
Kyla Chasalow
Department of Statistics, Harvard University, Cambridge, United States
Skyler Wu
Skyler Wu
Stanford University, Booz Allen Hamilton, Harvard University
Computational StatisticsMachine LearningData Mining
S
Susan Murphy
Department of Computer Science, Harvard University, Cambridge, United States