🤖 AI Summary
Predicting gRNA activity in CRISPR-Cas12 systems is hindered by limited labeled data, diverse PAM requirements, and reliance on large-scale task-specific training. Method: We propose a domain-pretraining-free multimodal transfer learning framework: (1) directly transferring sequence embeddings from transcriptionally pre-trained RNA foundation models (e.g., Nucleotide Transformer) to gRNA activity prediction; (2) innovatively integrating chromatin accessibility features from ATAC-seq and ChIP-seq data; and (3) employing a lightweight fully connected regressor with multi-task joint fine-tuning. Contribution/Results: Our approach consistently outperforms conventional baselines across multiple Cas12 variants and PAM sequences, achieving average R² improvements of 0.18–0.25. It demonstrates strong robustness under low-data regimes, validating the efficacy and practicality of cross-modal transfer from biological foundation models to CRISPR functional prediction.
📝 Abstract
Predicting guide RNA (gRNA) activity is critical for effective CRISPR-Cas12 genome editing but remains challenging due to limited data, variation across protospacer adjacent motifs (PAMs-short sequence requirements for Cas binding), and reliance on large-scale training. We investigate whether pre-trained biological foundation model originally trained on transcriptomic data can improve gRNA activity estimation even without domain-specific pre-training. Using embeddings from existing RNA foundation model as input to lightweight regressor, we show substantial gains over traditional baselines. We also integrate chromatin accessibility data to capture regulatory context, improving performance further. Our results highlight the effectiveness of pre-trained foundation models and chromatin accessibility data for gRNA activity prediction.