Functional Clustering of Survival Data via Smoothed Log-Hazard Trajectories: A Risk-Dynamics Perspective

📅 2026-05-31
📈 Citations: 0
Influential: 0
📄 PDF

career value

204K/year
🤖 AI Summary
Traditional survival analysis relies on cumulative survival probabilities, which struggle to capture the dynamic evolution of individual risk. This work proposes a novel clustering framework based on instantaneous hazard trajectories: it first employs B-spline smoothing to estimate log-hazard functions, then applies functional principal component analysis (FPCA) to extract dynamic risk features, automatically selecting the number of principal components according to the 95% cumulative variance explained criterion. Clustering is subsequently performed using unstandardized FPCA scores. This approach is the first to construct an interpretable clustering model from the perspective of risk trajectory evolution. Evaluated on both simulated and real clinical datasets—including German breast cancer and primary biliary cirrhosis cohorts—it demonstrates superior performance in handling crossing hazard curves, cohort imbalance, and outliers, while exhibiting strong within-cluster cohesion and robust diagnostic capability.
📝 Abstract
This paper investigates clustering in survival data by shifting the analytical focus from cumulative survival probabilities to instantaneous risk, as characterized by the hazard function. We model smoothed log-hazard trajectories as functional objects that capture the temporal evolution of risk and propose a clustering framework based on Functional Principal Component Analysis applied to B-spline smoothed log-hazard trajectories. The number of retained functional principal components is selected before clustering using a 95% cumulative explained-variance rule, and clustering is then performed on the unstandardized FPCA scores. The proposed methodology is evaluated through simulation studies covering progressively complex scenarios, including overlapping and crossing hazard functions, cohort imbalance, heterogeneous risk profiles, and outlier contamination. The framework is further illustrated on two real-world clinical datasets, the German Breast Cancer Study and the Primary Biliary Cirrhosis dataset. Results show that the proposed log-hazard-based functional clustering framework provides an interpretable representation of relative temporal risk dynamics, with competitive internal cohesion and explicit robustness diagnostics when compared with cumulative-survival-based benchmarks.
Problem

Research questions and friction points this paper is trying to address.

survival data
clustering
hazard function
risk dynamics
functional data
Innovation

Methods, ideas, or system contributions that make the work stand out.

functional clustering
log-hazard trajectories
Functional Principal Component Analysis
B-spline smoothing
risk dynamics
🔎 Similar Papers