π€ AI Summary
Outliers severely compromise the statistical consistency of persistent homology. Method: This paper introduces the Median-of-Means Distance (MoM Dist), the first distance function for topological data analysis (TDA) that incorporates the Median-of-Means (MoM) estimation paradigm from robust statistics. MoM Dist constructs a robust sublevel set filtration and a weighted filtration, enabling consistent estimation of the underlying populationβs true topological structure even in the presence of outliers. Contribution/Results: We establish theoretical guarantees showing that the induced filtration satisfies strong consistency and near-minimax optimality. Empirical evaluations demonstrate that MoM Dist significantly outperforms standard distance functions under both stochastic noise and adversarial perturbations. This work establishes a new paradigm for robust topological inference, bridging robust statistics and TDA to enhance reliability in real-world, outlier-contaminated settings.
π Abstract
The distance function to a compact set plays a crucial role in the paradigm of topological data analysis. In particular, the sublevel sets of the distance function are used in the computation of persistent homology -- a backbone of the topological data analysis pipeline. Despite its stability to perturbations in the Hausdorff distance, persistent homology is highly sensitive to outliers. In this work, we develop a framework of statistical inference for persistent homology in the presence of outliers. Drawing inspiration from recent developments in robust statistics, we propose a extit{median-of-means} variant of the distance function ( extsf{MoM Dist}) and establish its statistical properties. In particular, we show that, even in the presence of outliers, the sublevel filtrations and weighted filtrations induced by extsf{MoM Dist} are both consistent estimators of the true underlying population counterpart and exhibit near minimax-optimal performance in adversarial settings. Finally, we demonstrate the advantages of the proposed methodology through simulations and applications.