π€ AI Summary
In online reinforcement learning (RL), missing data must be imputed in real time at each timestep to support immediate decision-makingβyet conventional offline imputation methods fail to meet the stringent requirements of latency and policy feedback dependence. To address this, we propose the first online multiple imputation framework for RL: it maintains multiple parallel imputation trajectories to model uncertainty dynamically and jointly optimizes imputation, policy updates, and action selection. Our approach integrates a lightweight ensemble imputation mechanism with tabular Q-learning, enabling real-time learning under diverse dynamic missingness patterns in Grid World environments. Experiments demonstrate that the framework significantly improves policy convergence stability and robustness against missing data, particularly under non-stationary and correlated missingness mechanisms. This work establishes a novel paradigm for efficient, interpretable online RL with missing observations, bridging statistical imputation theory and sequential decision-making under uncertainty.
π Abstract
Missing data in online reinforcement learning (RL) poses challenges compared to missing data in standard tabular data or in offline policy learning. The need to impute and act at each time step means that imputation cannot be put off until enough data exist to produce stable imputation models. It also means future data collection and learning depend on previous imputations. This paper proposes fully online imputation ensembles. We find that maintaining multiple imputation pathways may help balance the need to capture uncertainty under missingness and the need for efficiency in online settings. We consider multiple approaches for incorporating these pathways into learning and action selection. Using a Grid World experiment with various types of missingness, we provide preliminary evidence that multiple imputation pathways may be a useful framework for constructing simple and efficient online missing data RL methods.