Adapting a Pre-trained Single-Cell Foundation Model to Spatial Gene Expression Generation from Histology Images

πŸ“… 2026-03-20
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Spatial transcriptomics remains limited by high costs and low throughput, creating an urgent need to accurately predict biologically coherent spatial gene expression from routine H&E histology images. To address this challenge, this work proposes HINGE, a novel framework that, for the first time, adapts a vision-free pretrained single-cell foundation model to the task of tissue image–guided gene expression generation. By integrating SoftAdaLN visual modulation, a masked diffusion objective, and a warm-start curriculum strategy, HINGE effectively mitigates modality mismatch and training instability while preserving inter-gene dependencies during cross-modal conditional generation. Evaluated across three datasets, HINGE substantially outperforms existing methods, achieving significant improvements in average Pearson correlation coefficient, spatial marker accuracy, and gene co-expression consistency.

Technology Category

Application Category

πŸ“ Abstract
Spatial transcriptomics (ST) enables spot-level in situ expression profiling, but its high cost and limited throughput motivate predicting expression directly from HE-stained histology. Recent advances explore using score- or flow-based generative models to estimate the conditional distribution of gene expression from histology, offering a flexible alternative to deterministic regression approaches. However, most existing generative approaches omit explicit modeling of gene-gene dependencies, undermining biological coherence. Single-cell foundation models (sc-FMs), pre-trained across diverse cell populations, capture these critical gene relationships that histology alone cannot reveal. Yet, applying expression-only sc-FMs to histology-conditioned expression modeling is nontrivial due to the absence of a visual pathway, a mismatch between their pre-training and conditional ST objectives, and the scarcity of mixed-cell ST supervision. To address these challenges, we propose HINGE (HIstology-coNditioned GEneration), which retrofits a pre-trained sc-FM into a conditional expression generator while mostly preserving its learned gene relationships. We achieve this by introducing SoftAdaLN, a lightweight, identity-initialized modulation that injects layer-wise visual context into the backbone, coupled with an expression-space masked diffusion objective and a warm-start curriculum to ensure objective alignment and training stability. Evaluated on three ST datasets, ours outperforms state-of-the-art baselines on mean Pearson correlation and yields more accurate spatial marker expression patterns and higher pairwise co-expression consistency, establishing a practical route to adapt pre-trained sc-FMs for histology-conditioned spatial expression generation.
Problem

Research questions and friction points this paper is trying to address.

spatial transcriptomics
histology image
gene expression generation
single-cell foundation model
gene-gene dependencies
Innovation

Methods, ideas, or system contributions that make the work stand out.

single-cell foundation model
spatial transcriptomics
histology-conditioned generation
SoftAdaLN
masked diffusion
πŸ”Ž Similar Papers
No similar papers found.