Central-to-Local Adaptive Generative Diffusion Framework for Improving Gene Expression Prediction in Data-Limited Spatial Transcriptomics

πŸ“… 2026-03-26
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Spatial transcriptomics is hindered by high costs, low throughput, and scarce data, impeding the development of robust computational models. To address this, this work proposes the C2L-ST framework, which first pretrains a global diffusion model on large-scale histopathology images to learn morphological priors and then performs lightweight gene-conditioned modulation of a local model using only a small amount of institution-specific image–gene paired data. This approach innovatively integrates global morphological representations with local molecular guidance, enabling cross-institutional adaptability, scalability, and data-efficient generative augmentation. Experiments demonstrate that the synthesized data closely approximates real samples in visual fidelity, cellular composition, and embedding distributions. When applied to downstream tasks, the generated data achieves spatial gene expression prediction accuracy and spatial consistency comparable to that obtained with full real datasets, using only a few sampled locations.
πŸ“ Abstract
Spatial Transcriptomics (ST) provides spatially resolved gene expression profiles within intact tissue architecture, enabling molecular analysis in histological context. However, the high cost, limited throughput, and restricted data sharing of ST experiments result in severe data scarcity, constraining the development of robust computational models. To address this limitation, we present a Central-to-Local adaptive generative diffusion framework for ST (C2L-ST) that integrates large-scale morphological priors with limited molecular guidance. A global central model is first pretrained on extensive histopathology datasets to learn transferable morphological representations, and institution-specific local models are then adapted through lightweight gene-conditioned modulation using a small number of paired image-gene spots. This strategy enables the synthesis of realistic and molecularly consistent histology patches under data-limited conditions. The generated images exhibit high visual and structural fidelity, reproduce cellular composition, and show strong embedding overlap with real data across multiple organs, reflecting both realism and diversity. When incorporated into downstream training, synthetic image-gene pairs improve gene expression prediction accuracy and spatial coherence, achieving performance comparable to real data while requiring only a fraction of sampled spots. C2L-ST provides a scalable and data-efficient framework for molecular-level data augmentation, offering a domain-adaptive and generalizable approach for integrating histology and transcriptomics in spatial biology and related fields.
Problem

Research questions and friction points this paper is trying to address.

Spatial Transcriptomics
data scarcity
gene expression prediction
data-limited
computational modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

generative diffusion
spatial transcriptomics
data augmentation
domain adaptation
morphology-guided generation
πŸ”Ž Similar Papers
No similar papers found.
Y
Yaoyu Fang
Department of Radiology, Northwestern University
J
Jiahe Qian
Department of Radiology, Northwestern University
X
Xinkun Wang
Department of Cell and Developmental Biology, Northwestern University
L
Lee A. Cooper
Department of Pathology, Northwestern University
Bo Zhou
Bo Zhou
Northwestern University
Medical AIMedical ImagingMedical Image AnalysisDeep LearningMedical Physics