🤖 AI Summary
Current breast cancer histopathology image retrieval methods rely heavily on large-scale annotated datasets, GPU resources, and deep learning models—limiting their accessibility and interpretability. To address these limitations, this paper proposes an unsupervised topological analysis framework. Leveraging cubical persistent homology, it introduces the first lightweight, interpretable “topological fingerprint” encoding Betti numbers and persistence diagrams directly from RGB images—requiring neither training nor annotations. Coupled with an efficient distance metric, the method enables k-nearest-neighbor retrieval. On the BreaKHis dataset, it outperforms existing supervised and unsupervised approaches; full-dataset processing completes in under 20 minutes on CPU only. Key contributions are: (1) the first application of cubical persistent homology to medical image retrieval; (2) a training-free, interpretable, low-resource topological encoding framework; and (3) empirical validation that topological features effectively capture fine-grained pathological similarity, demonstrating both efficacy and practical utility.
📝 Abstract
According to the World Health Organization, breast cancer claimed the lives of approximately 685,000 women in 2020. Early diagnosis and accurate clinical decision making are critical in reducing this global burden. In this study, we propose THIR, a novel Content-Based Medical Image Retrieval (CBMIR) framework that leverages topological data analysis specifically, Betti numbers derived from persistent homology to characterize and retrieve histopathological images based on their intrinsic structural patterns. Unlike conventional deep learning approaches that rely on extensive training, annotated datasets, and powerful GPU resources, THIR operates entirely without supervision. It extracts topological fingerprints directly from RGB histopathological images using cubical persistence, encoding the evolution of loops as compact, interpretable feature vectors. The similarity retrieval is then performed by computing the distances between these topological descriptors, efficiently returning the top-K most relevant matches.
Extensive experiments on the BreaKHis dataset demonstrate that THIR outperforms state of the art supervised and unsupervised methods. It processes the entire dataset in under 20 minutes on a standard CPU, offering a fast, scalable, and training free solution for clinical image retrieval.