Benchmarking Computational Pathology Foundation Models For Semantic Segmentation

📅 2026-02-21

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

This study addresses the lack of systematic, model-agnostic evaluation of foundation models for pixel-level semantic segmentation in histopathology. The authors propose a fine-tuning-free evaluation framework that extracts attention maps from vision or vision-language foundation models—including CLIP, DINO, and CONCH—as pixel-level features and combines them with an XGBoost classifier to uniformly assess the segmentation performance of ten foundation models across four tissue- and cell-level pathology datasets. Experimental results demonstrate that CONCH achieves the best performance, followed by PathDino. Moreover, fusing features from CONCH, PathDino, and CellViT substantially enhances generalization, yielding an average segmentation performance gain of 7.95% across all datasets, thereby validating the effectiveness of complementary cross-model representations.

Technology Category

Application Category

📝 Abstract

In recent years, foundation models such as CLIP, DINO,and CONCH have demonstrated remarkable domain generalization and unsupervised feature extraction capabilities across diverse imaging tasks. However, systematic and independent evaluations of these models for pixel-level semantic segmentation in histopathology remain scarce. In this study, we propose a robust benchmarking approach to asses 10 foundational models on four histopathological datasets covering both morphological tissue-region and cellular/nuclear segmentation tasks. Our method leverages attention maps of foundation models as pixel-wise features, which are then classified using a machine learning algorithm, XGBoost, enabling fast, interpretable, and model-agnostic evaluation without finetuning. We show that the vision language foundation model, CONCH performed the best across datasets when compared to vision-only foundation models, with PathDino as close second. Further analysis shows that models trained on distinct histopathology cohorts capture complementary morphological representations, and concatenating their features yields superior segmentation performance. Concatenating features from CONCH, PathDino and CellViT outperformed individual models across all the datasets by 7.95% (averaged across the datasets), suggesting that ensembles of foundation models can better generalize to diverse histopathological segmentation tasks.

Problem

Research questions and friction points this paper is trying to address.

computational pathology

foundation models

semantic segmentation

histopathology

benchmarking

Innovation

Methods, ideas, or system contributions that make the work stand out.

foundation models

semantic segmentation

computational pathology