🤖 AI Summary
In early drug discovery, existing gene perturbation prediction methods model only mean expression levels, failing to capture cellular heterogeneity. This work introduces the first deep learning framework capable of predicting the full single-cell gene expression distribution—including variance, skewness, and kurtosis. Methodologically, it innovatively adopts gene-level histograms as output targets and integrates large language model–derived gene embeddings as biologically informed priors to enable generalization to unseen perturbations. Experiments demonstrate that our model significantly outperforms baselines in distributional modeling (−12.7% KL divergence), reduces training cost by 35%, and maintains state-of-the-art accuracy in mean expression prediction. By enabling high-fidelity, distribution-aware perturbation response modeling, this work establishes a more realistic and robust paradigm for target identification and functional interpretation in perturbation biology.
📝 Abstract
We train a neural network to predict distributional responses in gene expression following genetic perturbations. This is an essential task in early-stage drug discovery, where such responses can offer insights into gene function and inform target identification. Existing methods only predict changes in the mean expression, overlooking stochasticity inherent in single-cell data. In contrast, we offer a more realistic view of cellular responses by modeling expression distributions. Our model predicts gene-level histograms conditioned on perturbations and outperforms baselines in capturing higher-order statistics, such as variance, skewness, and kurtosis, at a fraction of the training cost. To generalize to unseen perturbations, we incorporate prior knowledge via gene embeddings from large language models (LLMs). While modeling a richer output space, the method remains competitive in predicting mean expression changes. This work offers a practical step towards more expressive and biologically informative models of perturbation effects.