🤖 AI Summary
In biomedical machine learning, missing data are pervasive and severely degrade model performance and reliability. Existing imputation methods exhibit poor generalizability across diverse missingness mechanisms (MCAR, MAR, MNAR) and datasets. To address this, we propose a meta-imputation-based ensemble framework: first, outputs from multiple base imputers are fused; second, a meta-model is trained on synthetically masked data to learn the dynamic performance of each base imputer and produce adaptive, weighted combinations; third, the meta-model is refined via balanced training on masked data with ground-truth labels. The framework ensures both robustness and interpretability. Extensive experiments demonstrate that our method consistently outperforms individual imputation approaches across all missingness scenarios—yielding an average 3.2% improvement in downstream classification/regression AUC and a 27% gain in prediction stability.
📝 Abstract
Missing data represents a fundamental challenge in machine learning applications, often reducing model performance and reliability. This problem is particularly acute in fields like bioinformatics and clinical machine learning, where datasets are frequently incomplete due to the nature of both data generation and data collection. While numerous imputation methods exist, from simple statistical techniques to advanced deep learning models, no single method consistently performs well across diverse datasets and missingness mechanisms. This paper proposes a novel Meta-Imputation approach that learns to combine the outputs of multiple base imputers to predict missing values more accurately. By training the proposed method called Meta-Imputation Balanced (MIB) on synthetically masked data with known ground truth, the system learns to predict the most suitable imputed value based on the behavior of each method. Our work highlights the potential of ensemble learning in imputation and paves the way for more robust, modular, and interpretable preprocessing pipelines in real-world machine learning systems.