🤖 AI Summary
To address computational bottlenecks arising from the enormous size and structural complexity of whole-slide images (WSIs), this paper proposes a scalable, compact representation framework. First, WSIs are partitioned into patches, and pre-trained CNN features are extracted per patch. Next, K-means clustering groups patches into semantically coherent clusters. For each cluster, a Gaussian Mixture Model (GMM) is fitted, and Fisher Vector (FV) encoding is computed. Finally, cluster-level FVs are concatenated to form a global WSI representation. This work is the first to jointly leverage K-means clustering and Fisher vector aggregation for WSI representation—achieving linear time complexity while preserving both local semantic interpretability and global discriminability. Evaluated on multiple public WSI benchmarks, the method achieves state-of-the-art classification accuracy, with 3.2× faster inference speed and 68% lower memory footprint compared to leading approaches such as PatchCamelyon.
📝 Abstract
Whole slide images (WSIs) are high-resolution, gigapixel sized images that pose significant computational challenges for traditional machine learning models due to their size and heterogeneity.In this paper, we present a scalable and efficient methodology for WSI classification by leveraging patch-based feature extraction, clustering, and Fisher vector encoding. Initially, WSIs are divided into fixed size patches, and deep feature embeddings are extracted from each patch using a pre-trained convolutional neural network (CNN). These patch-level embeddings are subsequently clustered using K-means clustering, where each cluster aggregates semantically similar regions of the WSI. To effectively summarize each cluster, Fisher vector representations are computed by modeling the distribution of patch embeddings in each cluster as a parametric Gaussian mixture model (GMM). The Fisher vectors from each cluster are concatenated into a high-dimensional feature vector, creating a compact and informative representation of the entire WSI. This feature vector is then used by a classifier to predict the WSI's diagnostic label. Our method captures local and global tissue structures and yields robust performance for large-scale WSI classification, demonstrating superior accuracy and scalability compared to other approaches.