π€ AI Summary
Chronic disease patients face challenges in long-term health risk prediction and lack of model interpretability. Method: We propose a multi-morbidity interpretable monitoring system leveraging routine electronic health records (EHRs) to predict 3-, 6-, and 12-month disease exacerbation risks without requiring laboratory tests. Our approach introduces a rule-augmented random forest framework, integrating SHAP-based feature attribution with a clinically validated surrogate model, and formalizes decision logic via a structured rule engineβall interpretations rigorously validated by multidisciplinary clinical experts. Contribution/Results: Evaluated on multicenter real-world EHR data with internal cross-validation, the system achieves high discriminative performance (AUROC >0.85) and robustness (F1 >0.78), significantly outperforming baseline models. It has been deployed in the CureMD EMR platform, enabling real-time clinical risk stratification and evidence-informed intervention decisions.
π Abstract
This study addresses a critical gap in the healthcare system by developing a clinically meaningful, practical, and explainable disease surveillance system for multiple chronic diseases, utilizing routine EHR data from multiple U.S. practices integrated with CureMD's EMR/EHR system. Unlike traditional systems--using AI models that rely on features from patients' labs--our approach focuses on routinely available data, such as medical history, vitals, diagnoses, and medications, to preemptively assess the risks of chronic diseases in the next year. We trained three distinct models for each chronic disease: prediction models that forecast the risk of a disease 3, 6, and 12 months before a potential diagnosis. We developed Random Forest models, which were internally validated using F1 scores and AUROC as performance metrics and further evaluated by a panel of expert physicians for clinical relevance based on inferences grounded in medical knowledge. Additionally, we discuss our implementation of integrating these models into a practical EMR system. Beyond using Shapley attributes and surrogate models for explainability, we also introduce a new rule-engineering framework to enhance the intrinsic explainability of Random Forests.