ICU-TSB: A Benchmark for Temporal Patient Representation Learning for Unsupervised Stratification into Patient Cohorts

📅 2025-06-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses patient stratification from intensive care unit (ICU) temporal electronic health records (EHRs). We introduce ICU-TSB, the first dedicated unsupervised benchmark dataset for hierarchical ICU patient stratification, and propose a clinically grounded evaluation framework aligned with the ICD disease classification system—enabling the first systematic quantification of clustering–diagnosis group alignment. Methodologically, we integrate sequential modeling (LSTM/GRU) with statistical feature engineering and incorporate an interpretable cluster-label assignment strategy. Evaluated on MIMIC-III, eICU, and AmsterdamUMC, our approach achieves v-measure scores of 0.46 (top-level) to 0.40 (bottom-level), significantly outperforming baselines. All code, preprocessing pipelines, and experimental configurations are fully open-sourced to ensure reproducibility. Core contributions include: (1) establishing the first hierarchical ICU temporal benchmark; (2) introducing a clinical-alignment–driven evaluation paradigm; and (3) delivering an end-to-end hierarchical framework that jointly optimizes performance and interpretability.

Technology Category

Application Category

📝 Abstract
Patient stratification identifying clinically meaningful subgroups is essential for advancing personalized medicine through improved diagnostics and treatment strategies. Electronic health records (EHRs), particularly those from intensive care units (ICUs), contain rich temporal clinical data that can be leveraged for this purpose. In this work, we introduce ICU-TSB (Temporal Stratification Benchmark), the first comprehensive benchmark for evaluating patient stratification based on temporal patient representation learning using three publicly available ICU EHR datasets. A key contribution of our benchmark is a novel hierarchical evaluation framework utilizing disease taxonomies to measure the alignment of discovered clusters with clinically validated disease groupings. In our experiments with ICU-TSB, we compared statistical methods and several recurrent neural networks, including LSTM and GRU, for their ability to generate effective patient representations for subsequent clustering of patient trajectories. Our results demonstrate that temporal representation learning can rediscover clinically meaningful patient cohorts; nevertheless, it remains a challenging task, with v-measuring varying from up to 0.46 at the top level of the taxonomy to up to 0.40 at the lowest level. To further enhance the practical utility of our findings, we also evaluate multiple strategies for assigning interpretable labels to the identified clusters. The experiments and benchmark are fully reproducible and available at https://github.com/ds4dh/CBMS2025stratification.
Problem

Research questions and friction points this paper is trying to address.

Evaluating patient stratification using temporal EHR data
Comparing methods for temporal patient representation learning
Assessing alignment of clusters with clinical disease groups
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical evaluation framework using disease taxonomies
Comparison of statistical methods and recurrent neural networks
Strategies for assigning interpretable labels to clusters
🔎 Similar Papers
No similar papers found.
D
Dimitrios Proios
Department of Radiology and Medical Informatics, Faculty of Medicine, University of Geneva, Geneva, Switzerland
A
Alban Bornet
Department of Radiology and Medical Informatics, Faculty of Medicine, University of Geneva, Geneva, Switzerland
Anthony Yazdani
Anthony Yazdani
PhD student, University of Geneva
Deep learningMachine learningNatural language processingDigital Health
J
Jose F Rodrigues
Institute of Mathematics and Computer Science, University of Sao Paulo, Sao Carlos, Brazil
Douglas Teodoro
Douglas Teodoro
Professor, University of Geneva
biomedical NLPmachine learning for healthcaremedical informatics