MIMIC-Sepsis: A Curated Benchmark for Modeling and Learning from Sepsis Trajectories in the ICU

📅 2025-10-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing ICU-based sepsis studies suffer from outdated data, non-reproducible preprocessing, and insufficient coverage of therapeutic interventions. To address these limitations, this work constructs a standardized sepsis cohort (n=35,239) from MIMIC-IV, strictly adhering to the Sepsis-3 definition and integrating time-aligned clinical variables with multidimensional treatment data—including vasopressors, fluid administration, mechanical ventilation, and antibiotics. We propose a transparent, reproducible preprocessing pipeline featuring structured missing-value imputation and establish three benchmark tasks: early mortality prediction, length-of-stay estimation, and shock onset classification. Experimental results demonstrate that incorporating treatment variables significantly improves model performance—particularly under Transformer architectures. This study introduces the first open-source, reproducible benchmark platform specifically designed for sequential modeling in critical care, enabling standardized, comparable sepsis prediction research.

Technology Category

Application Category

📝 Abstract
Sepsis is a leading cause of mortality in intensive care units (ICUs), yet existing research often relies on outdated datasets, non-reproducible preprocessing pipelines, and limited coverage of clinical interventions. We introduce MIMIC-Sepsis, a curated cohort and benchmark framework derived from the MIMIC-IV database, designed to support reproducible modeling of sepsis trajectories. Our cohort includes 35,239 ICU patients with time-aligned clinical variables and standardized treatment data, including vasopressors, fluids, mechanical ventilation and antibiotics. We describe a transparent preprocessing pipeline-based on Sepsis-3 criteria, structured imputation strategies, and treatment inclusion-and release it alongside benchmark tasks focused on early mortality prediction, length-of-stay estimation, and shock onset classification. Empirical results demonstrate that incorporating treatment variables substantially improves model performance, particularly for Transformer-based architectures. MIMIC-Sepsis serves as a robust platform for evaluating predictive and sequential models in critical care research.
Problem

Research questions and friction points this paper is trying to address.

Addressing outdated datasets and non-reproducible sepsis research pipelines
Providing standardized clinical intervention data for ICU sepsis trajectories
Improving predictive model performance through treatment variable integration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Curated cohort with standardized treatment data
Transparent preprocessing pipeline with structured imputation
Transformer-based models enhanced by treatment variables
🔎 Similar Papers
No similar papers found.
Y
Yong Huang
Department of Computer Science, University of California, Irvine, Irvine, California
Zhongqi Yang
Zhongqi Yang
University of California, Irvine
Digital HealthMachine LearningPersonalizationLLMs
Amir Rahmani
Amir Rahmani
NASA Jet Propulsion Laboratory