Benchmarking histopathology foundation models in a multi-center dataset for skin cancer subtyping

📅 2025-06-23

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Foundational models exhibit weak cross-center generalization and lack robust evaluation metrics for stability under distribution shift in skin cancer subtype classification. Method: We introduce AI4SkIN, a multi-center whole-slide image benchmark, to systematically evaluate compositional pathology foundation models as patch-level feature extractors within a multiple-instance learning (MIL) framework. We propose the FM-Silhouette Index (FM-SI), a novel metric quantifying feature consistency under distribution shift, revealing the critical role of low-bias features for similarity-driven MIL classifiers. Results: Experiments demonstrate a strong positive correlation between feature quality and final classification accuracy. Our evaluation framework significantly improves complex subtype recognition—particularly enhancing the robustness of similarity-based MIL models—thereby establishing a reproducible, interpretable paradigm for real-world computational pathology foundation model assessment.

Technology Category

Application Category

📝 Abstract

Pretraining on large-scale, in-domain datasets grants histopathology foundation models (FM) the ability to learn task-agnostic data representations, enhancing transfer learning on downstream tasks. In computational pathology, automated whole slide image analysis requires multiple instance learning (MIL) frameworks due to the gigapixel scale of the slides. The diversity among histopathology FMs has highlighted the need to design real-world challenges for evaluating their effectiveness. To bridge this gap, our work presents a novel benchmark for evaluating histopathology FMs as patch-level feature extractors within a MIL classification framework. For that purpose, we leverage the AI4SkIN dataset, a multi-center cohort encompassing slides with challenging cutaneous spindle cell neoplasm subtypes. We also define the Foundation Model - Silhouette Index (FM-SI), a novel metric to measure model consistency against distribution shifts. Our experimentation shows that extracting less biased features enhances classification performance, especially in similarity-based MIL classifiers.

Problem

Research questions and friction points this paper is trying to address.

Evaluating histopathology foundation models for skin cancer subtyping

Assessing model consistency against distribution shifts in pathology

Improving classification performance with unbiased feature extraction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Pretraining on large-scale histopathology datasets

Multiple instance learning for gigapixel image analysis

Novel FM-SI metric for model consistency evaluation

🔎 Similar Papers

No similar papers found.

Authors to Follow