CytoSyn: a Foundation Diffusion Model for Histopathology -- Tech Report

📅 2026-03-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing generative foundation models lack support for histopathology and struggle with tasks such as virtual staining. To address this gap, we propose CytoSyn—the first latent diffusion foundation model tailored for histopathology—trained via large-scale self-supervision on over 10,000 whole-slide images from The Cancer Genome Atlas (TCGA), augmented with sampling optimization and anti-hallucination slide-level regularization techniques. The refined CytoSyn-v2 further advances performance, achieving state-of-the-art results in both photorealism and diversity of generated images. Notably, even when trained exclusively on tumor data, it can generate high-quality hematoxylin-and-eosin (H&E)-stained images across disease domains, including inflammatory bowel disease. Both the model and dataset are publicly released to facilitate a broad range of computational pathology applications.

Technology Category

Application Category

📝 Abstract
Computational pathology has made significant progress in recent years, fueling advances in both fundamental disease understanding and clinically ready tools. This evolution is driven by the availability of large amounts of digitized slides and specialized deep learning methods and models. Multiple self-supervised foundation feature extractors have been developed, enabling downstream predictive applications from cell segmentation to tumor sub-typing and survival analysis. In contrast, generative foundation models designed specifically for histopathology remain scarce. Such models could address tasks that are beyond the capabilities of feature extractors, such as virtual staining. In this paper, we introduce CytoSyn, a state-of-the-art foundation latent diffusion model that enables the guided generation of highly realistic and diverse histopathology H&E-stained images, as shown in an extensive benchmark. We explored methodological improvements, training set scaling, sampling strategies and slide-level overfitting, culminating in the improved CytoSyn-v2, and compared our work to PixCell, a state-of-the-art model, in an in-depth manner. This comparison highlighted the strong sensitivity of both diffusion models and performance metrics to preprocessing-specific details such as JPEG compression. Our model has been trained on a dataset obtained from more than 10,000 TCGA diagnostic whole-slide images of 32 different cancer types. Despite being trained only on oncology slides, it maintains state-of-the-art performance generating inflammatory bowel disease images. To support the research community, we publicly release CytoSyn's weights, its training and validation datasets, and a sample of synthetic images in this repository: https://huggingface.co/Owkin-Bioptimus/CytoSyn.
Problem

Research questions and friction points this paper is trying to address.

generative foundation models
histopathology
virtual staining
diffusion models
computational pathology
Innovation

Methods, ideas, or system contributions that make the work stand out.

foundation diffusion model
histopathology image generation
virtual staining
latent diffusion
computational pathology
🔎 Similar Papers
No similar papers found.
T
Thomas Duboudin
Owkin, Inc
X
Xavier Fontaine
Owkin, Inc
E
Etienne Andrier
Owkin, Inc
L
Lionel Guillou
Owkin, Inc
A
Alexandre Filiot
Owkin, Inc
T
Thalyssa Baiocco-Rodrigues
Owkin, Inc
Antoine Olivier
Antoine Olivier
Owkin
Alberto Romagnoni
Alberto Romagnoni
Owkin Inc.
Cancer ResearchData ScienceMathematical and Computational NeuroscienceTheoretical High Energy Physics
John Klein
John Klein
Carnegie Mellon Software Engineering Institute
J
Jean-Baptiste Schiratti
Owkin, Inc