🤖 AI Summary
Addressing the dual challenges of longitudinal covariate modeling and competing-risks assessment in survival analysis, this paper proposes the first Transformer-based framework incorporating factorized self-attention to directly estimate multi-endpoint risk functions end-to-end from sequential covariate trajectories, naturally accommodating censored data. Innovatively, we integrate factorized self-attention into an extended Cox model for competing risks and jointly optimize discriminative performance and calibration accuracy. We are the first to systematically quantify calibration using Expected Calibration Error (ECE) and Brier score decomposition. Evaluated on multiple real-world clinical datasets, our model significantly outperforms state-of-the-art methods (p < 0.01): calibration error decreases by 32–47%, and AUC improves by 2.1–5.8 percentage points. This work fills a longstanding gap in calibration evaluation for longitudinal survival modeling.
📝 Abstract
Survival analysis is a critical tool for modeling time-to-event data. Recent deep learning-based models have reduced various modeling assumptions including proportional hazard and linearity. However, a persistent challenge remains in incorporating longitudinal covariates, with prior work largely focusing on cross-sectional features, and in assessing calibration of these models, with research primarily focusing on discrimination during evaluation. We introduce TraCeR, a transformer-based survival analysis framework for incorporating longitudinal covariates. Based on a factorized self-attention architecture, TraCeR estimates the hazard function from a sequence of measurements, naturally capturing temporal covariate interactions without assumptions about the underlying data-generating process. The framework is inherently designed to handle censored data and competing events. Experiments on multiple real-world datasets demonstrate that TraCeR achieves substantial and statistically significant performance improvements over state-of-the-art methods. Furthermore, our evaluation extends beyond discrimination metrics and assesses model calibration, addressing a key oversight in literature.