🤖 AI Summary
Clinical time-series data suffer from severe missingness due to label scarcity and patient heterogeneity. To address this, we propose MIEO, a missingness-aware self-supervised representation learning framework. MIEO employs a novel missingness-aware autoencoder to learn robust latent patient representations from large-scale unlabeled electronic health records (EHRs), effectively mitigating data sparsity and missing-value interference. These representations are integrated with a lightweight classifier for cardiovascular mortality risk prediction. Evaluated on a real-world ischemic heart disease dataset, MIEO achieves a +4.2% improvement in balanced accuracy over state-of-the-art semi-supervised and imputation-based methods. To our knowledge, this is the first work to incorporate missingness-aware self-supervised encoding into clinical time-series representation learning. It demonstrates strong efficacy and generalizability under low-label-resource settings, offering a promising direction for robust predictive modeling in real-world EHR analytics.
📝 Abstract
As clinical data are becoming increasingly available, machine learning methods have been employed to extract knowledge from them and predict clinical events. While promising, approaches suffer from at least two main issues: low availability of labelled data and data heterogeneity leading to missing values. This work proposes the use of self-supervised auto-encoders to efficiently address these challenges. We apply our methodology to a clinical dataset from patients with ischaemic heart disease. Patient data is embedded in a latent space, built using unlabelled data, which is then used to train a neural network classifier to predict cardiovascular death. Results show improved balanced accuracy compared to applying the classifier directly to the raw data, demonstrating that this solution is promising, especially in conditions where availability of unlabelled data could increase.