π€ AI Summary
To address irregular sampling, high missingness rates, and class imbalance in electronic health record (EHR) data, this paper proposes MISS-GPNetβa missingness-aware multi-branch self-attention neural network. Methodologically, it innovatively integrates mask-guided multi-task Gaussian process imputation, time-encoded self-attention, interpretable multi-head attention, and a multi-branch balanced architecture to jointly model temporal dynamics, missingness patterns, and clinical interpretability. Evaluated on synthetic benchmarks and multiple real-world EHR datasets (e.g., MIMIC-IV), MISS-GPNet consistently outperforms state-of-the-art time-series models: it improves disease prediction accuracy by 3.2β5.8%, enhances attribution fidelity for critical time points (AUC +12.4%), and demonstrates superior generalization to minority classes under limited samples. These advances establish a new paradigm for clinical decision support systems that simultaneously achieves high predictive performance and model transparency.
π Abstract
The era of big data has made vast amounts of clinical data readily available, particularly in the form of electronic health records (EHRs), which provides unprecedented opportunities for developing data-driven diagnostic tools to enhance clinical decision making. However, the application of EHRs in data-driven modeling faces challenges such as irregularly spaced multi-variate time series, issues of incompleteness, and data imbalance. Realizing the full data potential of EHRs hinges on the development of advanced analytical models. In this paper, we propose a novel Missingness-aware mUlti-branching Self-Attention Encoder (MUSE-Net) to cope with the challenges in modeling longitudinal EHRs for data-driven disease prediction. The proposed MUSE-Net is composed by four novel modules including: (1) a multi-task Gaussian process (MGP) with missing value masks for data imputation; (2) a multi-branching architecture to address the data imbalance problem; (3) a time-aware self-attention encoder to account for the irregularly spaced time interval in longitudinal EHRs; (4) interpretable multi-head attention mechanism that provides insights into the importance of different time points in disease prediction, allowing clinicians to trace model decisions. We evaluate the proposed MUSE-Net using both synthetic and real-world datasets. Experimental results show that our MUSE-Net outperforms existing methods that are widely used to investigate longitudinal signals.