SPIDER: A Comprehensive Multi-Organ Supervised Pathology Dataset and Baseline Models

📅 2025-03-04
📈 Citations: 0
Influential: 0
📄 PDF

career value

181K/year
🤖 AI Summary
Existing public histopathological datasets suffer from narrow organ coverage, incomplete class representation, and low-quality annotations, hindering AI research in computational pathology. To address these limitations, we introduce SPIDER—the largest publicly available multi-organ (skin, colorectal, thoracic) patch-level histopathology image dataset to date—featuring comprehensive tissue-type annotations, expert-collaborative validation, and multi-scale spatial contextual patches. We propose a scalable baseline method built upon the Hibou-L foundation model augmented with an attention-based classification head, enabling both lesion localization and quantitative tissue analysis. Our approach achieves state-of-the-art classification performance across multiple tissue categories. Both the SPIDER dataset and the proposed model are fully open-sourced, significantly enhancing reproducibility in digital pathology research and strengthening foundational capabilities for multimodal modeling.

Technology Category

Application Category

📝 Abstract
Advancing AI in computational pathology requires large, high-quality, and diverse datasets, yet existing public datasets are often limited in organ diversity, class coverage, or annotation quality. To bridge this gap, we introduce SPIDER (Supervised Pathology Image-DEscription Repository), the largest publicly available patch-level dataset covering multiple organ types, including Skin, Colorectal, and Thorax, with comprehensive class coverage for each organ. SPIDER provides high-quality annotations verified by expert pathologists and includes surrounding context patches, which enhance classification performance by providing spatial context. Alongside the dataset, we present baseline models trained on SPIDER using the Hibou-L foundation model as a feature extractor combined with an attention-based classification head. The models achieve state-of-the-art performance across multiple tissue categories and serve as strong benchmarks for future digital pathology research. Beyond patch classification, the model enables rapid identification of significant areas, quantitative tissue metrics, and establishes a foundation for multimodal approaches. Both the dataset and trained models are publicly available to advance research, reproducibility, and AI-driven pathology development. Access them at: https://github.com/HistAI/SPIDER
Problem

Research questions and friction points this paper is trying to address.

Lack of diverse, high-quality datasets in computational pathology.
Need for comprehensive multi-organ pathology datasets with expert annotations.
Establishing benchmarks for AI-driven pathology research and development.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Largest multi-organ pathology dataset SPIDER
Hibou-L model with attention-based classification
Public dataset and models for AI pathology
🔎 Similar Papers
No similar papers found.