Immunocto: a massive immune cell database auto-generated for histopathology

📅 2024-06-03
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the critical bottleneck of lacking high-precision immune cell annotations in hematoxylin and eosin (H&E)-stained histopathology slides for tumor immune microenvironment (TIME) research, this work proposes the first fully automated, H&E–immunofluorescence (IF) registration-driven framework for constructing an immune cell database. Leveraging 6.84 million cellular instances—including 2.28 million immune cells—the framework integrates the Segment Anything Model (SAM), dual-modality image registration, weakly supervised segmentation, and multi-scale standardized cropping (64×64 at 40× magnification) to achieve nucleus-level localization and automated subtyping (CD4+, CD8+, CD20+, CD68+, CD163+). The resulting Immunocto database is the first publicly available, million-scale H&E-based immune cell resource featuring precise nuclear masks and clinically relevant subtype labels, substantially reducing reliance on manual annotation. Evaluated on lymphocyte detection, models trained on Immunocto achieve state-of-the-art performance.

Technology Category

Application Category

📝 Abstract
With the advent of novel cancer treatment options such as immunotherapy, studying the tumour immune micro-environment (TIME) is crucial to inform on prognosis and understand potential response to therapeutic agents. A key approach to characterising the TIME may be through combining (1) digitised microscopic high-resolution optical images of hematoxylin and eosin (H&E) stained tissue sections obtained in routine histopathology examinations with (2) automated immune cell detection and classification methods. In this work, we introduce a workflow to automatically generate robust single cell contours and labels from dually stained tissue sections with H&E and multiplexed immunofluorescence (IF) markers. The approach harnesses the Segment Anything Model and requires minimal human intervention compared to existing single cell databases. With this methodology, we create Immunocto, a massive, multi-million automatically generated database of 6,848,454 human cells and objects, including 2,282,818 immune cells distributed across 4 subtypes: CD4$^+$ T cell lymphocytes, CD8$^+$ T cell lymphocytes, CD20$^+$ B cell lymphocytes, and CD68$^+$/CD163$^+$ macrophages. For each cell, we provide a 64$ imes$64 pixels$^2$ H&E image at $mathbf{40} imes$ magnification, along with a binary mask of the nucleus and a label. The database, which is made publicly available, can be used to train models to study the TIME on routine H&E slides. We show that deep learning models trained on Immunocto result in state-of-the-art performance for lymphocyte detection. The approach demonstrates the benefits of using matched H&E and IF data to generate robust databases for computational pathology applications.
Problem

Research questions and friction points this paper is trying to address.

Automated immune cell detection from histopathology images.
Creation of a massive immune cell database for cancer research.
Improving tumor immune microenvironment analysis using deep learning.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated immune cell detection and classification
Segment Anything Model for single cell contours
Massive database with H&E and IF data
🔎 Similar Papers
No similar papers found.
M
Mikaël Simard
Medical Physics, UCL, London, UK
Z
Zhuoyan Shen
Medical Physics, UCL, London, UK
M
Maria A Hawkins
Medical Physics, UCL, London, UK, Radiotherapy, UCLH, London, UK
C
C. Collins-Fekete
Medical Physics, UCL, London, UK