Meta-Imputation Balanced (MIB): An Ensemble Approach for Handling Missing Data in Biomedical Machine Learning

📅 2025-09-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In biomedical machine learning, missing data are pervasive and severely degrade model performance and reliability. Existing imputation methods exhibit poor generalizability across diverse missingness mechanisms (MCAR, MAR, MNAR) and datasets. To address this, we propose a meta-imputation-based ensemble framework: first, outputs from multiple base imputers are fused; second, a meta-model is trained on synthetically masked data to learn the dynamic performance of each base imputer and produce adaptive, weighted combinations; third, the meta-model is refined via balanced training on masked data with ground-truth labels. The framework ensures both robustness and interpretability. Extensive experiments demonstrate that our method consistently outperforms individual imputation approaches across all missingness scenarios—yielding an average 3.2% improvement in downstream classification/regression AUC and a 27% gain in prediction stability.

Technology Category

Application Category

📝 Abstract
Missing data represents a fundamental challenge in machine learning applications, often reducing model performance and reliability. This problem is particularly acute in fields like bioinformatics and clinical machine learning, where datasets are frequently incomplete due to the nature of both data generation and data collection. While numerous imputation methods exist, from simple statistical techniques to advanced deep learning models, no single method consistently performs well across diverse datasets and missingness mechanisms. This paper proposes a novel Meta-Imputation approach that learns to combine the outputs of multiple base imputers to predict missing values more accurately. By training the proposed method called Meta-Imputation Balanced (MIB) on synthetically masked data with known ground truth, the system learns to predict the most suitable imputed value based on the behavior of each method. Our work highlights the potential of ensemble learning in imputation and paves the way for more robust, modular, and interpretable preprocessing pipelines in real-world machine learning systems.
Problem

Research questions and friction points this paper is trying to address.

Handling missing data in biomedical machine learning
Combining multiple imputation methods for better accuracy
Improving robustness and reliability of imputation techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

Ensemble learning combines multiple base imputers
Meta-Imputation Balanced trains on synthetic masked data
Learns optimal imputation from diverse method behaviors
F
Fatemeh Azad
University of Ljubljana, Faculty of Computer and Information Science, Ljubljana, Slovenia
Zoran Bosnić
Zoran Bosnić
University of Ljubljana, Faculty of Computer and Information Science
machine learningdata miningincremental learningrecommender systemssemi-supervised learning
M
Matjaž Kukar
University of Ljubljana, Faculty of Computer and Information Science, Ljubljana, Slovenia