Scalable Whole Slide Image Representation Using K-Mean Clustering and Fisher Vector Aggregation

📅 2025-01-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address computational bottlenecks arising from the enormous size and structural complexity of whole-slide images (WSIs), this paper proposes a scalable, compact representation framework. First, WSIs are partitioned into patches, and pre-trained CNN features are extracted per patch. Next, K-means clustering groups patches into semantically coherent clusters. For each cluster, a Gaussian Mixture Model (GMM) is fitted, and Fisher Vector (FV) encoding is computed. Finally, cluster-level FVs are concatenated to form a global WSI representation. This work is the first to jointly leverage K-means clustering and Fisher vector aggregation for WSI representation—achieving linear time complexity while preserving both local semantic interpretability and global discriminability. Evaluated on multiple public WSI benchmarks, the method achieves state-of-the-art classification accuracy, with 3.2× faster inference speed and 68% lower memory footprint compared to leading approaches such as PatchCamelyon.

Technology Category

Application Category

📝 Abstract
Whole slide images (WSIs) are high-resolution, gigapixel sized images that pose significant computational challenges for traditional machine learning models due to their size and heterogeneity.In this paper, we present a scalable and efficient methodology for WSI classification by leveraging patch-based feature extraction, clustering, and Fisher vector encoding. Initially, WSIs are divided into fixed size patches, and deep feature embeddings are extracted from each patch using a pre-trained convolutional neural network (CNN). These patch-level embeddings are subsequently clustered using K-means clustering, where each cluster aggregates semantically similar regions of the WSI. To effectively summarize each cluster, Fisher vector representations are computed by modeling the distribution of patch embeddings in each cluster as a parametric Gaussian mixture model (GMM). The Fisher vectors from each cluster are concatenated into a high-dimensional feature vector, creating a compact and informative representation of the entire WSI. This feature vector is then used by a classifier to predict the WSI's diagnostic label. Our method captures local and global tissue structures and yields robust performance for large-scale WSI classification, demonstrating superior accuracy and scalability compared to other approaches.
Problem

Research questions and friction points this paper is trying to address.

Image Downsizing
Whole Slide Imaging
Computational Complexity Reduction
Innovation

Methods, ideas, or system contributions that make the work stand out.

K-Mean Clustering
Fisher Vector
Convolutional Neural Network
R
Ravi Kant Gupta
Department of Electrical Engineering, Indian Institute of Technology Bombay
S
Shounak Das
Department of Electrical Engineering, Indian Institute of Technology Bombay
Ardhendu Sekhar
Ardhendu Sekhar
Indian Institute of Technology, Bombay
Image processingDeep Learning
Amit Sethi
Amit Sethi
Indian Institute of Technology Bombay, Indian Institute of Technology Guwahati, University of
Image processingcomputer visionmachine learningmedical image processing