🤖 AI Summary
Missing values frequently induce analytical bias or failure, yet existing imputation methods lack task-agnostic, universal evaluation criteria. To address this, we propose MVIAnalyzer—the first general-purpose analytical framework that embeds missing value imputation (MVI) into end-to-end data science workflows. Our approach enables comprehensive pre- and post-imputation assessment, encompassing synthetic data generation, machine learning modeling, and result visualization. It introduces configurable, multi-mechanism missingness simulation (MCAR, MAR, MNAR) with fine-grained parametric control, and delivers an open-source toolchain supporting systematic benchmarking across diverse data types, models, and evaluation metrics. Extensive experiments on multiple real-world datasets validate the framework’s effectiveness, uncovering performance boundaries and contextual applicability of mainstream imputation methods across tasks. MVIAnalyzer establishes a reproducible, extensible analytical paradigm and evidence-based guidance for both MVI research and practical deployment.
📝 Abstract
Missing values often limit the usage of data analysis or cause falsification of results. Therefore, methods of missing value imputation (MVI) are of great significance. However, in general, there is no universal, fair MVI method for different tasks. This work thus places MVI in the overall context of data analysis. For this purpose, we present the MVIAnalyzer, a generic framework for a holistic analysis of MVI. It considers the overall process up to the application and analysis of machine learning methods. The associated software is provided and can be used by other researchers for their own analyses. To this end, it further includes a missing value simulation with consideration of relevant parameters. The application of the MVIAnalyzer is demonstrated on data with different characteristics. An evaluation of the results shows the possibilities and limitations of different MVI methods. Since MVI is a very complex topic with different influencing variables, this paper additionally illustrates how the analysis can be supported by visualizations.