🤖 AI Summary
This study addresses the challenges in cross-species genotype–phenotype (G2P) mapping: the difficulty of large-scale decoding of morphological phenotypes and the limitations of conventional models—namely, oversimplified assumptions and small-sample bottlenecks. We propose the first diffusion-based framework for multi-species G2P prediction. Methodologically: (1) we reformulate phenotype prediction as a conditional image generation task; (2) we design an environment-augmented DNA sequence conditioner that jointly encodes genetic and environmental context; and (3) we introduce a gene–phenotype alignment training strategy to enforce cross-modal consistency. Experiments demonstrate substantial improvements in multi-species morphological phenotype prediction accuracy and high sensitivity to subtle phenotypic differences induced by genetic variants. Our approach establishes a scalable, interpretable G2P modeling paradigm with broad applications in crop breeding, conservation biology, and precision medicine.
📝 Abstract
Discovering the genotype-phenotype relationship is crucial for genetic engineering, which will facilitate advances in fields such as crop breeding, conservation biology, and personalized medicine. Current research usually focuses on single species and small datasets due to limitations in phenotypic data collection, especially for traits that require visual assessments or physical measurements. Deciphering complex and composite phenotypes, such as morphology, from genetic data at scale remains an open question. To break through traditional generic models that rely on simplified assumptions, this paper introduces G2PDiffusion, the first-of-its-kind diffusion model designed for genotype-to-phenotype generation across multiple species. Specifically, we use images to represent morphological phenotypes across species and redefine phenotype prediction as conditional image generation. To this end, this paper introduces an environment-enhanced DNA sequence conditioner and trains a stable diffusion model with a novel alignment method to improve genotype-to-phenotype consistency. Extensive experiments demonstrate that our approach enhances phenotype prediction accuracy across species, capturing subtle genetic variations that contribute to observable traits.