Meta-Imputation Balanced (MIB): An Ensemble Approach for Handling Missing Data in Biomedical Machine Learning

📅 2025-09-03

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

In biomedical machine learning, missing data are pervasive and severely degrade model performance and reliability. Existing imputation methods exhibit poor generalizability across diverse missingness mechanisms (MCAR, MAR, MNAR) and datasets. To address this, we propose a meta-imputation-based ensemble framework: first, outputs from multiple base imputers are fused; second, a meta-model is trained on synthetically masked data to learn the dynamic performance of each base imputer and produce adaptive, weighted combinations; third, the meta-model is refined via balanced training on masked data with ground-truth labels. The framework ensures both robustness and interpretability. Extensive experiments demonstrate that our method consistently outperforms individual imputation approaches across all missingness scenarios—yielding an average 3.2% improvement in downstream classification/regression AUC and a 27% gain in prediction stability.

Technology Category

Application Category

📝 Abstract

Missing data represents a fundamental challenge in machine learning applications, often reducing model performance and reliability. This problem is particularly acute in fields like bioinformatics and clinical machine learning, where datasets are frequently incomplete due to the nature of both data generation and data collection. While numerous imputation methods exist, from simple statistical techniques to advanced deep learning models, no single method consistently performs well across diverse datasets and missingness mechanisms. This paper proposes a novel Meta-Imputation approach that learns to combine the outputs of multiple base imputers to predict missing values more accurately. By training the proposed method called Meta-Imputation Balanced (MIB) on synthetically masked data with known ground truth, the system learns to predict the most suitable imputed value based on the behavior of each method. Our work highlights the potential of ensemble learning in imputation and paves the way for more robust, modular, and interpretable preprocessing pipelines in real-world machine learning systems.

Problem

Research questions and friction points this paper is trying to address.

Handling missing data in biomedical machine learning

Combining multiple imputation methods for better accuracy

Improving robustness and reliability of imputation techniques

Innovation

Methods, ideas, or system contributions that make the work stand out.

Ensemble learning combines multiple base imputers

Meta-Imputation Balanced trains on synthetic masked data

Learns optimal imputation from diverse method behaviors

🔎 Similar Papers

Learnable Prompt as Pseudo-Imputation: Rethinking the Necessity of Traditional EHR Data Imputation in Downstream Clinical Prediction

2024-01-30Citations: 0

Not Another Imputation Method: A Transformer-based Model for Missing Values in Tabular Datasets

2024-07-16arXiv.orgCitations: 4

Authors to Follow