Comparative Analysis of Stroke Prediction Models Using Machine Learning

📅 2025-05-14

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This study addresses critical challenges in stroke risk prediction—namely, insufficient model sensitivity, severe class imbalance, and pervasive missing data. We systematically develop and compare logistic regression, random forest, and XGBoost models, integrating SMOTE-based oversampling, multiple imputation, and SHAP-based interpretability analysis. For the first time in stroke prediction, we quantitatively evaluate the sensitivity bottlenecks of these three mainstream algorithms and identify age, hypertension, fasting blood glucose, and BMI as the four most interpretable, dominant predictive features. Our clinically oriented optimization strategy elevates XGBoost sensitivity to 78%—a statistically significant improvement over baseline—while maintaining accuracy above 92% and high specificity. This enhances early identification of high-risk individuals without compromising diagnostic reliability. The framework provides a methodologically rigorous and empirically validated foundation for deploying interpretable, real-world stroke risk screening tools.

Technology Category

Application Category

📝 Abstract

Stroke remains one of the most critical global health challenges, ranking as the second leading cause of death and the third leading cause of disability worldwide. This study explores the effectiveness of machine learning algorithms in predicting stroke risk using demographic, clinical, and lifestyle data from the Stroke Prediction Dataset. By addressing key methodological challenges such as class imbalance and missing data, we evaluated the performance of multiple models, including Logistic Regression, Random Forest, and XGBoost. Our results demonstrate that while these models achieve high accuracy, sensitivity remains a limiting factor for real-world clinical applications. In addition, we identify the most influential predictive features and propose strategies to improve machine learning-based stroke prediction. These findings contribute to the development of more reliable and interpretable models for the early assessment of stroke risk.

Problem

Research questions and friction points this paper is trying to address.

Evaluating machine learning models for stroke risk prediction

Addressing class imbalance and missing data in stroke datasets

Identifying key features to improve prediction sensitivity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Machine learning models predict stroke risk

Address class imbalance and missing data

Identify influential features for improvement

🔎 Similar Papers

No similar papers found.