Can-SAVE: Mass Cancer Risk Prediction via Survival Analysis Variables and EHR

📅 2023-09-26
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Conventional cancer screening is costly and relies on specialized modalities (e.g., imaging or genomics), limiting scalability. Method: We propose a large-scale risk prediction framework leveraging sparse, non-temporal, high-order medical event sequences from routine electronic health records (EHRs). Our approach uniquely integrates survival analysis variables into gradient-boosting models (XGBoost/LightGBM), eliminating the need for deep clinical data or high-performance computing. It employs survival-aware feature engineering and medical event sequence encoding. Results: Evaluated on a retrospective cohort of >1.1 million individuals, our method achieves an Average Precision of 22.8% ± 2.7%, representing a 51% improvement over baselines; TOP@1000 recall increases 4.7–6.4×; and clinical validation yields 84 true positive detections per 1,000 screened individuals (NNNS = 9), significantly outperforming conventional strategies.
📝 Abstract
Specific medical cancer screening methods are often costly, time-consuming, and weakly applicable on a large scale. Advanced Artificial Intelligence (AI) methods greatly help cancer detection but require specific or deep medical data. These aspects prevent the mass implementation of cancer screening methods. For this reason, it is a disruptive change for healthcare to apply AI methods for mass personalized assessment of the cancer risk among patients based on the existing Electronic Health Records (EHR) volume. This paper presents a novel Can-SAVE cancer risk assessment method combining a survival analysis approach with a gradient-boosting algorithm. It is highly accessible and resource-efficient, utilizing only a sequence of high-level medical events. We tested the proposed method in a long-term retrospective experiment covering more than 1.1 million people and four regions of Russia. The Can-SAVE method significantly exceeds the baselines by the Average Precision metric of 22.8%$pm$2.7% vs 15.1%$pm$2.6%. The extensive ablation study also confirmed the proposed method's dominant performance. The experiment supervised by oncologists shows a reliable cancer patient detection rate of up to 84 out of 1000 selected. Such results surpass the medical screening strategies estimates; the typical age-specific Number Needed to Screen is only 9 out of 1000 (for colorectal cancer). Overall, our experiments show a 4.7-6.4 times improvement in cancer detection rate (TOP@1k) compared to the traditional healthcare risk estimation approach.
Problem

Research questions and friction points this paper is trying to address.

Develops a low-cost AI system for population-scale cancer screening
Uses medical history events to rank cancer risks before symptoms appear
Achieves higher detection rates and broader coverage than conventional methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses survival model outputs in gradient-boosting for risk patterns
Ranks cancer risks solely from medical history events
Processes one million patients in three hours on standard hardware
🔎 Similar Papers
No similar papers found.
P
Petr Philonenko
Sber AI Lab
V
V. Kokh
Sber AI
Pavel Blinov
Pavel Blinov
Unknown affiliation
machine learningnatural language processing