Fast Iterative and Task-Specific Imputation with Online Learning

📅 2025-01-23

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

To address the challenge of imputation under missing-not-at-random (MNAR) mechanisms, this paper proposes F3I—a task-aware, fast online imputation method. F3I integrates weighted k-nearest neighbors, iterative optimization, and online learning to enable end-to-end joint training of imputation and downstream classification/prediction tasks—an unprecedented capability in the literature. Theoretically, we establish a unified analytical framework that provides provable imputation quality guarantees across diverse missingness mechanisms, including MNAR. Empirically, F3I achieves state-of-the-art performance on synthetic benchmarks, drug repositioning, and handwritten digit recognition—outperforming mainstream imputation methods in both accuracy and computational efficiency. Its core innovations lie in (i) a task-driven online joint optimization paradigm and (ii) synergistic theoretical and practical guarantees specifically tailored for MNAR scenarios.

Technology Category

Application Category

📝 Abstract

Missing feature values are a significant hurdle for downstream machine-learning tasks such as classification and regression. However, they are pervasive in multiple real-life use cases, for instance, in drug discovery research. Moreover, imputation methods might be time-consuming and offer few guarantees on the imputation quality, especially for not-missing-at-random mechanisms. We propose an imputation approach named F3I based on the iterative improvement of a K-nearest neighbor imputation that learns the weights for each neighbor of a data point, optimizing for the most likely distribution of points over data points. This algorithm can also be jointly trained with a downstream task on the imputed values. We provide a theoretical analysis of the imputation quality by F3I for several types of missing mechanisms. We also demonstrate the performance of F3I on both synthetic data sets and real-life drug repurposing and handwritten-digit recognition data.

Problem

Research questions and friction points this paper is trying to address.

Missing Data

Machine Learning

Imputation Methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

F3I method

online learning for missing data imputation

parallel learning for improved prediction and classification accuracy

🔎 Similar Papers

Not Another Imputation Method: A Transformer-based Model for Missing Values in Tabular Datasets