🤖 AI Summary
This work proposes mRNAutilus, a novel framework for end-to-end design of therapeutic mRNA transcripts that jointly optimizes coding sequences and untranslated regions (UTRs) to enhance mRNA stability, translational efficiency, and protein expression. Integrating a masked discrete diffusion model with Monte Carlo tree search, mRNAutilus simultaneously performs codon optimization and de novo UTR design within a unified generative process. A lightweight regressor operating in the embedding space predicts multiple functional properties, enabling Pareto-efficient multi-objective optimization to identify high-performing sequences. In zero-shot experiments, mRNAutilus-generated luciferase mRNAs achieved protein expression levels up to 400-fold higher than wild-type constructs, while SARS-CoV-2 spike protein mRNAs outperformed both commercial and clinical benchmarks. The framework further demonstrated significant improvements in diverse therapeutic contexts, including gene editing and targeted protein degradation.
📝 Abstract
Therapeutic mRNA design requires coordinating multiple interacting sequence features across the full transcript, where codon usage, untranslated regions (UTRs), and their coupling jointly determine stability, translation efficiency, and protein expression. Here, we present mRNA generation via unrolled trajectories and informed latent updates (mRNAutilus), a framework for simultaneous codon optimization and de novo UTR design directly from sequence. mRNAutilus combines a masked discrete diffusion model trained on millions of full-length mRNAs with Monte Carlo Tree Guidance to generate Pareto-efficient sequences under multiple functional objectives, using lightweight regressors over model embeddings to predict half-life, translation efficiency, and protein abundance. Unlike recent methods that design coding sequences and UTRs separately or rely on post hoc assembly and screening, mRNAutilus generates complete transcripts in a single process optimized across properties. Across diverse targets, zero-shot mRNAs encoding P. pyralis luciferase achieve over 400-fold higher expression than wild-type and outperform commercial and machine learning-designed baselines, including zero-shot generative approaches. Zero-shot SARS-CoV-2 Spike mRNAs exceed clinically used and commercial constructs and match or surpass lab-optimized designs with improved durability. We further demonstrate generality in therapeutic settings, including prime editing (PEMax) and programmable proteome modulation, where mRNAutilus-designed constructs enhance expression of peptide-guided E3 ligases (uAbs) for beta-catenin degradation. These results establish a sequence-based, multi-objective framework for generating functional mRNAs tailored to diverse biological applications.