XGBoost-Based Prediction of ICU Mortality in Sepsis-Associated Acute Kidney Injury Patients Using MIMIC-IV Database with Validation from eICU Database

📅 2025-02-25

🏛️ medRxiv

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

This study addresses the challenge of predicting in-hospital mortality among patients with sepsis-associated acute kidney injury (SA-AKI) in intensive care units (ICUs). We developed and validated a high-accuracy, interpretable XGBoost model using the MIMIC-IV database. A novel feature selection strategy—integrating variance inflation factor (VIF) analysis, recursive feature elimination (RFE), and clinical expert consensus—was employed to identify robust predictors. To ensure clinical interpretability without compromising performance, we integrated SHAP and LIME as complementary explanation frameworks. External validation on the eICU database confirmed strong generalizability. Internal validation yielded an AUROC of 0.878 (95% CI: 0.859–0.897). Key predictive features included SOFA score and serum lactate level. The model enables early risk stratification and supports precision interventions for SA-AKI patients in the ICU.

Technology Category

Application Category

📝 Abstract

Background: Sepsis Associated Acute Kidney Injury (SA AKI) leads to high mortality in intensive care. This study develops machine learning models using the Medical Information Mart for Intensive Care IV (MIMIC IV) database to predict Intensive Care Unit (ICU) mortality in SA AKI patients. External validation is conducted using the eICU Collaborative Research Database. Methods: For 9,474 identified SA AKI patients in MIMIC IV, key features like lab results, vital signs, and comorbidities were selected using Variance Inflation Factor (VIF), Recursive Feature Elimination (RFE), and expert input, narrowing to 24 predictive variables. An Extreme Gradient Boosting (XGBoost) model was built for in hospital mortality prediction, with hyperparameters optimized using GridSearch. Model interpretability was enhanced with SHapley Additive exPlanations (SHAP) and Local Interpretable Model agnostic Explanations (LIME). External validation was conducted using the eICU database. Results: The proposed XGBoost model achieved an internal Area Under the Receiver Operating Characteristic curve (AUROC) of 0.878 (95% Confidence Interval: 0.859 to 0.897). SHAP identified Sequential Organ Failure Assessment (SOFA), serum lactate, and respiratory rate as key mortality predictors. LIME highlighted serum lactate, Acute Physiology and Chronic Health Evaluation II (APACHE II) score, total urine output, and serum calcium as critical features. Conclusions: The integration of advanced techniques with the XGBoost algorithm yielded a highly accurate and interpretable model for predicting SA AKI mortality across diverse populations. It supports early identification of high risk patients, enhancing clinical decision making in intensive care. Future work needs to focus on enhancing adaptability, versatility, and real world applications.

Problem

Research questions and friction points this paper is trying to address.

Predict ICU mortality in SA-AKI patients

Use XGBoost model for mortality prediction

Validate model with eICU database

Innovation

Methods, ideas, or system contributions that make the work stand out.

XGBoost model for mortality prediction

Feature selection using VIF and RFE

SHAP and LIME for model interpretability

🔎 Similar Papers

Data-Driven Machine Learning Approaches for Predicting In-Hospital Sepsis Mortality