From Persistence to Survival: Hypothesis Testing, Effect Sizes and Vectorisation for Topological Features

📅 2026-06-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Persistent diagrams lack a natural vector space structure, and existing statistical methods struggle to integrate effectively with downstream predictive tasks. This work addresses these limitations by treating persistent diagrams as survival data and introduces a unified framework based on persistence survival functions. For the first time, this approach simultaneously enables hypothesis testing, effect size quantification, and 1-Wasserstein stable vectorization from a single interpretable representation. By integrating survival analysis with nonparametric two-sample testing, the method demonstrates well-calibrated type I error control and high statistical power on synthetic manifolds. It achieves strong performance across 14 benchmark tasks involving graphs and 3D point clouds and is successfully applied to fMRI-based brain functional connectivity analysis.
📝 Abstract
Persistence diagrams are common representations in topological data analysis, but they do not naturally live in a vector space, and the statistical tools developed for comparing them have largely evolved separately from those used for downstream prediction. We introduce STRAND (Survival Topological Representation ANalysis of Diagrams), which treats (collections of) PDs as survival data: each topological feature with persistence value $p = d - b$ is a fully observed time-to-event, and the persistence survival function $S(t) = \mathbb{P}(p > t)$ is the central object for comparing diagrams. From this single representation we derive (i) a non-parametric two-sample test with calibrated Type I error and high power from a small number of diagrams; (ii) interpretable effect sizes; and (iii) a 1-Wasserstein-stable feature vector for downstream machine learning. We validate calibration and power on synthetic manifolds with controlled topology, demonstrate competitive vectorisation across 14 graph and 3D point cloud benchmarks, and apply the method to study functional brain connectivity in fMRI/neuroscience data. To our knowledge, STRAND is the first method to provide hypothesis testing and vectorisation for persistence diagrams from a single coherent and interpretable representation.
Problem

Research questions and friction points this paper is trying to address.

persistence diagrams
topological data analysis
hypothesis testing
vectorisation
survival analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

persistence diagrams
topological data analysis
survival analysis
vectorization
hypothesis testing