Pre-AF 13: An Interpretable Atrial Fibrillation Risk Score Mined from Discharge Reports

📅 2026-06-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing atrial fibrillation (AF) risk scores, which rely heavily on common factors such as advanced age and hypertension, struggle to effectively stratify intermediate-term risk among patients with cardiovascular disease. This study addresses this limitation by developing an NLP pipeline that combines rule-based parsing with Transformer-based named entity recognition to extract structured features from discharge summaries. Using these features, an interpretable machine learning model was trained via LightAutoML. The work proposes Pre-AF 13, a novel 13-variable score for intermediate-term AF risk prediction, and a streamlined 9-variable linear version, Pre-AF 9, designed for rapid bedside assessment. Validated in a cohort of 45,000 patients, Pre-AF 13 achieved an AUC of 0.725 for predicting AF within 24 months—significantly outperforming traditional scores like CHARGE-AF (AUC 0.53–0.64)—and successfully stratified AF incidence from 7% to 36%.
📝 Abstract
Background. Atrial fibrillation (AF) is the most prevalent cardiac arrhythmia and a major determinant of prognosis. Established AF risk scores rely on factors (older age, hypertension) nearly ubiquitous among patients with cardiovascular disease (CVD), offering limited stratification in this high-risk group. Most target long-term (5-10 year) rather than medium-term prediction. We developed interpretable ML models predicting AF risk over a 24-month and entire follow-up horizon in CVD patients using routinely collected hospital data. Methods. Single-center retrospective study of electronic health records from the National Research Cardiology Center (Russia) for patients aged >=18 with CVD but without pre-existing AF, hospitalized more than once between January 2012 and May 2019. A custom NLP pipeline transformed unstructured discharge reports into 73 structured features, combining a rule-based parser with transformer-based NER. Using LightAutoML we built a full model (73 features), a simple model (reduced subset), and a linear model for a bedside risk score. Performance was assessed by ROC AUC, compared with CHARGE-AF, C2HEST, MHS, and HAVOC, and interpreted via SHAP. Results. Of 80,576 records from 45,000 patients, 17,562 met inclusion criteria; 1,438 (8.19%) developed AF. The full model reached ROC AUC 0.735 (24-month) and 0.696 (entire follow-up); the simple model was nearly identical (0.725, 0.696). All non-linear models outperformed the four clinical risk scores (ROC AUC 0.53-0.64). The simple model uses 13 features and is named Pre-AF 13. SHAP identified age and left atrial volume as dominant predictors. A linear risk score (Pre-AF 9) stratified observed 24-month AF incidence from ~7% to 36%. Conclusion. Interpretable ML models built from routinely collected EHR data identify high-AF-risk CVD patients, outperforming established clinical risk scores.
Problem

Research questions and friction points this paper is trying to address.

atrial fibrillation
risk prediction
cardiovascular disease
medium-term risk
clinical risk scores
Innovation

Methods, ideas, or system contributions that make the work stand out.

interpretable machine learning
natural language processing
atrial fibrillation risk prediction
electronic health records
SHAP interpretability
🔎 Similar Papers
No similar papers found.
💼 Related Jobs
Postdoctoral Fellow – AI-Driven Multi-Omics Integration for Predictive Toxicology
Pfizer
The annual base salary for this position ranges from $64,600.00 to $107,600.00. In addition, this position is eligible for participation in Pfizer’s Global Performance Plan with a bonus target of 7.5% of the base salary. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of life’s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site – U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
Hybrid
O
Olga Shakhmatova
National Medical Research Center of Cardiology named after Academician E.I. Chazov, Moscow, Russia
D
Dmitrii Kriukov
Skolkovo Institute of Science and Technology (Skoltech), Moscow, Russia; Artificial Intelligence Research Institute (AIRI), Moscow, Russia
Daniil Larionov
Daniil Larionov
Universität Mannheim
Natural Language ProcessingDeep LearningComputational Linguistics
N
Nikita Khromov
Russian Center for Scientific Information (RCSI), Moscow, Russia
I
Iaroslav Bespalov
Artificial Intelligence Research Institute (AIRI), Moscow, Russia; Skolkovo Institute of Science and Technology (Skoltech), Moscow, Russia
A
Alexander Zolotarev
Institute of Cyber Intelligence Systems, National Research Nuclear University MEPhI, Moscow, Russia
K
Kirill Grishchenkov
M.V. Lomonosov Moscow State University, Moscow, Russia
Ekaterina Ivanova
Ekaterina Ivanova
Queen Mary University of London
human-robot interactionhaptic communicationneurorehabilitation
M
Miron Kuznetsov
Skolkovo Institute of Science and Technology (Skoltech), Moscow, Russia
I
Ilya Sochenkov
M.V. Lomonosov Moscow State University, Moscow, Russia; Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute), Moscow, Russia; Ivannikov Institute for System Programming of the Russian Academy of Sciences (ISP RAS), Moscow, Russia; Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences (FRC CSC RAS), Moscow, Russia
E
Elizaveta Panchenko
National Medical Research Center of Cardiology named after Academician E.I. Chazov, Moscow, Russia
Artem Shelmanov
Artem Shelmanov
MBZUAI
uncertainty estimationfairnessactive learningnlpdeep learning
Dmitry V. Dylov
Dmitry V. Dylov
Associate Professor, Computational Imaging Lab
applied mathematicscomputational imaging