CytoSyn: a Foundation Diffusion Model for Histopathology -- Tech Report

📅 2026-03-18

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing generative foundation models lack support for histopathology and struggle with tasks such as virtual staining. To address this gap, we propose CytoSyn—the first latent diffusion foundation model tailored for histopathology—trained via large-scale self-supervision on over 10,000 whole-slide images from The Cancer Genome Atlas (TCGA), augmented with sampling optimization and anti-hallucination slide-level regularization techniques. The refined CytoSyn-v2 further advances performance, achieving state-of-the-art results in both photorealism and diversity of generated images. Notably, even when trained exclusively on tumor data, it can generate high-quality hematoxylin-and-eosin (H&E)-stained images across disease domains, including inflammatory bowel disease. Both the model and dataset are publicly released to facilitate a broad range of computational pathology applications.

Technology Category

Application Category

📝 Abstract

Computational pathology has made significant progress in recent years, fueling advances in both fundamental disease understanding and clinically ready tools. This evolution is driven by the availability of large amounts of digitized slides and specialized deep learning methods and models. Multiple self-supervised foundation feature extractors have been developed, enabling downstream predictive applications from cell segmentation to tumor sub-typing and survival analysis. In contrast, generative foundation models designed specifically for histopathology remain scarce. Such models could address tasks that are beyond the capabilities of feature extractors, such as virtual staining. In this paper, we introduce CytoSyn, a state-of-the-art foundation latent diffusion model that enables the guided generation of highly realistic and diverse histopathology H&E-stained images, as shown in an extensive benchmark. We explored methodological improvements, training set scaling, sampling strategies and slide-level overfitting, culminating in the improved CytoSyn-v2, and compared our work to PixCell, a state-of-the-art model, in an in-depth manner. This comparison highlighted the strong sensitivity of both diffusion models and performance metrics to preprocessing-specific details such as JPEG compression. Our model has been trained on a dataset obtained from more than 10,000 TCGA diagnostic whole-slide images of 32 different cancer types. Despite being trained only on oncology slides, it maintains state-of-the-art performance generating inflammatory bowel disease images. To support the research community, we publicly release CytoSyn's weights, its training and validation datasets, and a sample of synthetic images in this repository: https://huggingface.co/Owkin-Bioptimus/CytoSyn.

Problem

Research questions and friction points this paper is trying to address.

generative foundation models

histopathology

virtual staining

diffusion models

computational pathology

Innovation

Methods, ideas, or system contributions that make the work stand out.

foundation diffusion model

histopathology image generation

virtual staining

latent diffusion

computational pathology

🔎 Similar Papers

No similar papers found.

Authors to Follow