🤖 AI Summary
Patient similarity computation (PSC) faces three key challenges: low computational efficiency, difficulty in fusing heterogeneous multi-source data, and insufficient privacy protection. To address these, this paper proposes an efficient, privacy-preserving, distributed PSC framework tailored for clinical decision support. First, we introduce an adaptive Weight-of-Evidence (aWOE) combined with Z-score transformation to enable privacy-safe modeling of static demographic data. Second, we design a distributed Dynamic Time Warping (DTW) algorithm to overcome the scalability bottleneck of conventional DTW on large-scale longitudinal physiological time-series data (e.g., heart rate, blood pressure). Third, we incorporate clustering-based preprocessing to accelerate similarity retrieval. Evaluated on coronary heart disease and heart failure prediction tasks, our framework achieves up to a 15.9% AUC improvement, 10.5–12.6% gains in accuracy, a 21.9% increase in F1-score, and a 40% reduction in overall computation time.
📝 Abstract
Patient similarity computation (PSC) is a fundamental problem in healthcare informatics. The aim of the patient similarity computation is to measure the similarity among patients according to their historical clinical records, which helps to improve clinical decision support. This paper presents a novel distributed patient similarity computation (DPSC) technique based on data transformation (DT) methods, utilizing an effective combination of time series and static data. Time series data are sensor-collected patients' information, including metrics like heart rate, blood pressure, Oxygen saturation, respiration, etc. The static data are mainly patient background and demographic data, including age, weight, height, gender, etc. Static data has been used for clustering the patients. Before feeding the static data to the machine learning model adaptive Weight-of-Evidence (aWOE) and Z-score data transformation (DT) methods have been performed, which improve the prediction performances. In aWOE-based patient similarity models, sensitive patient information has been processed using aWOE which preserves the data privacy of the trained models. We used the Dynamic Time Warping (DTW) approach, which is robust and very popular, for time series similarity. However, DTW is not suitable for big data due to the significant computational run-time. To overcome this problem, distributed DTW computation is used in this study. For Coronary Artery Disease, our DT based approach boosts prediction performance by as much as 11.4%, 10.20%, and 12.6% in terms of AUC, accuracy, and F-measure, respectively. In the case of Congestive Heart Failure (CHF), our proposed method achieves performance enhancement up to 15.9%, 10.5%, and 21.9% for the same measures, respectively. The proposed method reduces the computation time by as high as 40%.