Beyond Random Missingness: Clinically Rethinking for Healthcare Time Series Imputation

📅 2024-05-26
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
Current evaluation of medical time-series imputation models relies heavily on the random missingness assumption, overlooking clinically prevalent non-random, structured missingness patterns—leading to assessments that poorly reflect real-world clinical utility. Method: Leveraging the PhysioNet Challenge 2012 dataset, we systematically benchmark 11 state-of-the-art imputation methods—including RNN-, GAN-, and Transformer-based approaches—and introduce a clinically informed masking strategy to jointly evaluate imputation accuracy and downstream mortality prediction performance. Contributions/Results: (1) We provide the first empirical evidence that imputation accuracy does not necessarily correlate with clinical prediction AUC; several high-accuracy models fail to improve—or even degrade—mortality prediction; (2) RNN-based models demonstrate superior robustness under structured missingness; (3) Optimized clinical masking improves mortality prediction AUC by up to 3.2%. This work shifts imputation evaluation from a purely technical paradigm toward one grounded in clinical utility.

Technology Category

Application Category

📝 Abstract
This study investigates the impact of masking strategies on time series imputation models in healthcare settings. While current approaches predominantly rely on random masking for model evaluation, this practice fails to capture the structured nature of missing patterns in clinical data. Using the PhysioNet Challenge 2012 dataset, we analyse how different masking implementations affect both imputation accuracy and downstream clinical predictions across eleven imputation methods. Our results demonstrate that masking choices significantly influence model performance, while recurrent architectures show more consistent performance across strategies. Analysis of downstream mortality prediction reveals that imputation accuracy doesn't necessarily translate to optimal clinical prediction capabilities. Our findings emphasise the need for clinically-informed masking strategies that better reflect real-world missing patterns in healthcare data, suggesting current evaluation frameworks may need reconsideration for reliable clinical deployment.
Problem

Research questions and friction points this paper is trying to address.

Impact of masking strategies on imputation models
Structured missing patterns in clinical data
Imputation accuracy versus clinical prediction capabilities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Clinically-informed masking strategies
Analysis of downstream clinical predictions
Recurrent architectures' consistent performance
🔎 Similar Papers
No similar papers found.
L
Linglong Qian
Institute of Psychiatry, Psychology and Neuroscience, King’s College London; PyPOTS Research
Zina Ibrahim
Zina Ibrahim
Institute of Psychiatry, Psychology and Neuroscience, King’s College London
W
Wenjie Du
PyPOTS Research
Yiyuan Yang
Yiyuan Yang
Department of Computer Science, University of Oxford
Signal processingData miningTime seriesMultimodalityMachine learning
R
Richard Dobson
Institute of Psychiatry, Psychology and Neuroscience, King’s College London; University College London; Health Data Research UK
J
Jun Wang