🤖 AI Summary
Missing physiological measurements (e.g., heart rate) in ICU time-series data severely degrade the performance of clinical predictive models, yet existing imputation methods lack systematic, task-oriented evaluation.
Method: We establish a scalable, reusable benchmark framework to comprehensively compare 15 time-series imputation methods—including mean imputation, linear interpolation, last-observation-carried-forward (LOCF), Kalman filtering, and deep learning models—alongside 4 censoring strategies, across major ICU datasets. Using controlled missingness simulations, we quantify how each method affects downstream prediction tasks.
Contribution/Results: Our empirical analysis reveals that optimal imputation significantly improves model accuracy, whereas common heuristics (e.g., zero imputation) introduce substantial bias and performance degradation. The study delivers an evidence-based, clinically informed guide for selecting imputation strategies, advancing standardization and reliability in preprocessing temporal healthcare data for machine learning.
📝 Abstract
As more Intensive Care Unit (ICU) data becomes available, the interest in developing clinical prediction models to improve healthcare protocols increases. However, the lack of data quality still hinders clinical prediction using Machine Learning (ML). Many vital sign measurements, such as heart rate, contain sizeable missing segments, leaving gaps in the data that could negatively impact prediction performance. Previous works have introduced numerous time-series imputation techniques. Nevertheless, more comprehensive work is needed to compare a representative set of methods for imputing ICU vital signs and determine the best practice. In reality, ad-hoc imputation techniques that could decrease prediction accuracy, like zero imputation, are still used. In this work, we compare established imputation techniques to guide researchers in improving the performance of clinical prediction models by selecting the most accurate imputation technique. We introduce an extensible and reusable benchmark with currently 15 imputation and 4 amputation methods, created for benchmarking on major ICU datasets. We hope to provide a comparative basis and facilitate further ML development to bring more models into clinical practice.