๐ค AI Summary
To address the challenges of high dimensionality, sparsity, strong biological heterogeneity, and sensitivity to graph-structural noise in single-cell RNA sequencing (scRNA-seq) data, this work proposes the first synthetic framework integrating graph representation learning with conditional score-based diffusion models, enabling cell-type-controllable, high-fidelity generation. Methodologically: (1) Laplacian positional encoding is introduced to enhance topological relationship modeling among cells; (2) a spectral-domain edge-weight-based adversarial perturbation mechanism is designed to improve robustness against graph structural variations; (3) the first conditional graph diffusion generative paradigm is established. Evaluated on multiple real-world batched scRNA-seq datasets, our method significantly outperforms existing generative models. The synthesized data exhibit high biological plausibility, precise cell-type specificity, and strong utility in downstream tasksโproviding reliable synthetic data support for applications such as cell-type annotation and perturbation response prediction.
๐ Abstract
Generating high-fidelity and biologically plausible synthetic single-cell RNA sequencing (scRNA-seq) data, especially with conditional control, is challenging due to its high dimensionality, sparsity, and complex biological variations. Existing generative models often struggle to capture these unique characteristics and ensure robustness to structural noise in cellular networks. We introduce LapDDPM, a novel conditional Graph Diffusion Probabilistic Model for robust and high-fidelity scRNA-seq generation. LapDDPM uniquely integrates graph-based representations with a score-based diffusion model, enhanced by a novel spectral adversarial perturbation mechanism on graph edge weights. Our contributions are threefold: we leverage Laplacian Positional Encodings (LPEs) to enrich the latent space with crucial cellular relationship information; we develop a conditional score-based diffusion model for effective learning and generation from complex scRNA-seq distributions; and we employ a unique spectral adversarial training scheme on graph edge weights, boosting robustness against structural variations. Extensive experiments on diverse scRNA-seq datasets demonstrate LapDDPM's superior performance, achieving high fidelity and generating biologically-plausible, cell-type-specific samples. LapDDPM sets a new benchmark for conditional scRNA-seq data generation, offering a robust tool for various downstream biological applications.