🤖 AI Summary
Pancreatic cancer lacks effective early screening methods, and most patients are diagnosed too late for curative intervention. This study addresses this challenge by proposing a customizable Transformer model trained on longitudinal electronic health records and blood test data, designed to be transferable across diverse healthcare settings. Leveraging multi-head attention mechanisms, the model predicts individual risk of pancreatic cancer onset within 1–3 years and incorporates Bayesian priors to dynamically calibrate population-level prevalence for targeted screening. In external validation, the model achieved strong performance with AUCs of 0.837, 0.797, and 0.760 for 1-, 2-, and 3-year predictions, respectively, demonstrating excellent calibration (slope = 1.08; Brier score = 0.025). At a risk threshold of >3.3%, it yielded a diagnostic odds ratio of 18.2, offering a viable pathway toward large-scale precision screening.
📝 Abstract
Earlier detection of pancreatic cancer is key to enabling wider access to curative treatment and reducing cancer deaths; however, screening is presently not viable. Latent indicators of pathology are evident in an individual's disease and blood test trajectories and may predict the development of pancreatic cancer. Longitudinal sequences of coded diagnoses and blood test values accrued by patients throughout their clinical interactions were used to train a custom Transformer-based neural network with a multi-head attention mechanism to predict risk of pancreatic cancer with a multi-year lead time and risk-stratify populations for targeted screening. The cohort comprised 6,017 adults with pancreatic cancer and 177,081 controls (overall median age 75, 45% female) with median 12 years (interquartile range 6.9-16.2) of medical history prior to pancreatic cancer diagnosis. External validation via leave-one-site-out, out-of-sample testing predicting pancreatic cancer 1-, 2-, and 3-years prior to diagnosis demonstrated mean area under the receiver operating characteristic of 0.837 (95% confidence interval 0.827-0.848), 0.797 (95% confidence interval 0.782-0.813), and 0.760 (95% confidence interval 0.745-0.776), respectively. Estimated pancreatic cancer risks were well-calibrated (calibration plot slope 1.08, intercept of -0.077; Brier score 0.025), and a Bayesian population pancreatic cancer prevalence update allows estimated cancer risk outputs to be transportable across settings. At testing, a screening threshold of >3.3% risk of pancreatic cancer in 1-year offered a diagnostic odds ratio of 18.2. Our work therefore lays the foundation for a first population-level digital enrichment tool to widen access to curative-intent management of pancreatic cancer.