Missing Data Multiple Imputation for Tabular Q-Learning in Online RL

📅 2025-10-12

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

In online reinforcement learning (RL), missing data must be imputed in real time at each timestep to support immediate decision-making—yet conventional offline imputation methods fail to meet the stringent requirements of latency and policy feedback dependence. To address this, we propose the first online multiple imputation framework for RL: it maintains multiple parallel imputation trajectories to model uncertainty dynamically and jointly optimizes imputation, policy updates, and action selection. Our approach integrates a lightweight ensemble imputation mechanism with tabular Q-learning, enabling real-time learning under diverse dynamic missingness patterns in Grid World environments. Experiments demonstrate that the framework significantly improves policy convergence stability and robustness against missing data, particularly under non-stationary and correlated missingness mechanisms. This work establishes a novel paradigm for efficient, interpretable online RL with missing observations, bridging statistical imputation theory and sequential decision-making under uncertainty.

Technology Category

Application Category

📝 Abstract

Missing data in online reinforcement learning (RL) poses challenges compared to missing data in standard tabular data or in offline policy learning. The need to impute and act at each time step means that imputation cannot be put off until enough data exist to produce stable imputation models. It also means future data collection and learning depend on previous imputations. This paper proposes fully online imputation ensembles. We find that maintaining multiple imputation pathways may help balance the need to capture uncertainty under missingness and the need for efficiency in online settings. We consider multiple approaches for incorporating these pathways into learning and action selection. Using a Grid World experiment with various types of missingness, we provide preliminary evidence that multiple imputation pathways may be a useful framework for constructing simple and efficient online missing data RL methods.

Problem

Research questions and friction points this paper is trying to address.

Online RL faces missing data challenges during real-time learning

Imputation must balance uncertainty capture with computational efficiency

Multiple imputation pathways address missingness in tabular Q-learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Online imputation ensembles for tabular Q-learning

Multiple imputation pathways capture uncertainty efficiently

Grid World tests validate online missing data RL methods

🔎 Similar Papers

Not Another Imputation Method: A Transformer-based Model for Missing Values in Tabular Datasets