🤖 AI Summary
To address the challenge of imputation under missing-not-at-random (MNAR) mechanisms, this paper proposes F3I—a task-aware, fast online imputation method. F3I integrates weighted k-nearest neighbors, iterative optimization, and online learning to enable end-to-end joint training of imputation and downstream classification/prediction tasks—an unprecedented capability in the literature. Theoretically, we establish a unified analytical framework that provides provable imputation quality guarantees across diverse missingness mechanisms, including MNAR. Empirically, F3I achieves state-of-the-art performance on synthetic benchmarks, drug repositioning, and handwritten digit recognition—outperforming mainstream imputation methods in both accuracy and computational efficiency. Its core innovations lie in (i) a task-driven online joint optimization paradigm and (ii) synergistic theoretical and practical guarantees specifically tailored for MNAR scenarios.
📝 Abstract
Missing feature values are a significant hurdle for downstream machine-learning tasks such as classification and regression. However, they are pervasive in multiple real-life use cases, for instance, in drug discovery research. Moreover, imputation methods might be time-consuming and offer few guarantees on the imputation quality, especially for not-missing-at-random mechanisms. We propose an imputation approach named F3I based on the iterative improvement of a K-nearest neighbor imputation that learns the weights for each neighbor of a data point, optimizing for the most likely distribution of points over data points. This algorithm can also be jointly trained with a downstream task on the imputed values. We provide a theoretical analysis of the imputation quality by F3I for several types of missing mechanisms. We also demonstrate the performance of F3I on both synthetic data sets and real-life drug repurposing and handwritten-digit recognition data.