Beyond Accuracy: An Empirical Study of Uncertainty Estimation in Imputation

📅 2025-11-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Uncertainty quantification in missing data imputation is often overlooked, and the relationship between calibration quality and imputation accuracy remains poorly understood. This paper presents the first systematic empirical evaluation of six state-of-the-art imputation methods—statistical (MICE, SoftImpute), distribution-alignment (OT-Impute), and deep generative (GAIN, MIWAE, TabCSDI)—across multiple real-world datasets, under MCAR, MAR, and MNAR missingness mechanisms, and across varying missing rates. We propose a multi-path evaluation framework integrating repeated sampling variability, conditional distribution modeling, and predictive confidence quantification to rigorously assess uncertainty calibration. Results reveal that high imputation accuracy does not imply well-calibrated uncertainty estimates; significant trade-offs exist among accuracy, calibration fidelity, and computational efficiency across method categories. We identify several robust, reproducible configurations, providing actionable, evidence-based guidance for model selection in downstream machine learning and data cleaning tasks.

Technology Category

Application Category

📝 Abstract
Handling missing data is a central challenge in data-driven analysis. Modern imputation methods not only aim for accurate reconstruction but also differ in how they represent and quantify uncertainty. Yet, the reliability and calibration of these uncertainty estimates remain poorly understood. This paper presents a systematic empirical study of uncertainty in imputation, comparing representative methods from three major families: statistical (MICE, SoftImpute), distribution alignment (OT-Impute), and deep generative (GAIN, MIWAE, TabCSDI). Experiments span multiple datasets, missingness mechanisms (MCAR, MAR, MNAR), and missingness rates. Uncertainty is estimated through three complementary routes: multi-run variability, conditional sampling, and predictive-distribution modeling, and evaluated using calibration curves and the Expected Calibration Error (ECE). Results show that accuracy and calibration are often misaligned: models with high reconstruction accuracy do not necessarily yield reliable uncertainty. We analyze method-specific trade-offs among accuracy, calibration, and runtime, identify stable configurations, and offer guidelines for selecting uncertainty-aware imputers in data cleaning and downstream machine learning pipelines.
Problem

Research questions and friction points this paper is trying to address.

Evaluates uncertainty estimation reliability in imputation methods
Compares statistical, distribution alignment, and deep generative imputation techniques
Analyzes trade-offs between accuracy, calibration, and runtime in imputation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematic comparison of imputation methods across statistical, distribution alignment, and deep generative families
Evaluation of uncertainty via multi-run variability, conditional sampling, and predictive-distribution modeling
Analysis of trade-offs between accuracy, calibration, and runtime to guide uncertainty-aware imputer selection
🔎 Similar Papers
No similar papers found.
Z
Zarin Tahia Hossain
Department of Computer Science, Western University, London, Ontario, Canada
Mostafa Milani
Mostafa Milani
Assistant Professor, The University of Western Ontario
Data QualityData Cleaning