🤖 AI Summary
This study addresses the challenge of insufficient alignment between histomorphological and transcriptomic representations in cross-modal prediction from histopathology images to gene expression profiles. To this end, we propose a dual-pathway, multi-level discriminative framework: (1) a multi-scale feature extraction module constructs parallel morphological and transcriptional encoders; (2) instance-level contrastive learning is integrated with cross-hierarchical (instance-to-group) joint contrastive alignment to achieve multi-granularity semantic alignment across modalities. Evaluated on multiple public spatial transcriptomics datasets, our method significantly outperforms existing approaches—achieving an average 12.6% improvement in R²—and demonstrates strong generalizability across diverse tissue types and experimental platforms. The core contribution lies in the first incorporation of hierarchical contrastive learning into a dual-pathway architecture, enabling systematic modeling of multi-level associations—from cellular morphology to molecular expression patterns.
📝 Abstract
Accurately predicting gene expression from histopathology images offers a scalable and non-invasive approach to molecular profiling, with significant implications for precision medicine and computational pathology. However, existing methods often underutilize the cross-modal representation alignment between histopathology images and gene expression profiles across multiple representational levels, thereby limiting their prediction performance. To address this, we propose Gene-DML, a unified framework that structures latent space through Dual-pathway Multi-Level discrimination to enhance correspondence between morphological and transcriptional modalities. The multi-scale instance-level discrimination pathway aligns hierarchical histopathology representations extracted at local, neighbor, and global levels with gene expression profiles, capturing scale-aware morphological-transcriptional relationships. In parallel, the cross-level instance-group discrimination pathway enforces structural consistency between individual (image/gene) instances and modality-crossed (gene/image, respectively) groups, strengthening the alignment across modalities. By jointly modelling fine-grained and structural-level discrimination, Gene-DML is able to learn robust cross-modal representations, enhancing both predictive accuracy and generalization across diverse biological contexts. Extensive experiments on public spatial transcriptomics datasets demonstrate that Gene-DML achieves state-of-the-art performance in gene expression prediction. The code and checkpoints will be released soon.