Reinforcement-guided generative protein language models enable de novo design of highly diverse AAV capsids

📅 2026-03-19

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of adeno-associated virus (AAV) capsid design for gene therapy, which is hindered by the vast sequence space and limited experimental screening capacity. The authors propose a novel approach that integrates reinforcement learning with protein language models, leveraging pretraining, fine-tuning on experimental data, and a reward mechanism to guide sequence generation. This strategy simultaneously overcomes the constraints of training data distribution while balancing functional viability and sequence novelty. The method substantially outperforms baseline approaches relying solely on fine-tuning and introduces a candidate ranking scheme that incorporates multidimensional biophysical properties, thereby significantly enhancing the efficiency of discovering high-potential AAV capsids.

Technology Category

Application Category

📝 Abstract

Adeno-associated viral (AAV) vectors are widely used delivery platforms in gene therapy, and the design of improved capsids is key to expanding their therapeutic potential. A central challenge in AAV bioengineering, as in protein design more broadly, is the vast sequence design space relative to the scale of feasible experimental screening. Machine-guided generative approaches provide a powerful means of navigating this landscape and proposing novel protein sequences that satisfy functional constraints. Here, we develop a generative design framework based on protein language models and reinforcement learning to generate highly novel yet functionally plausible AAV capsids. A pretrained model was fine-tuned on experimentally validated capsid sequences to learn patterns associated with viability. Reinforcement learning was then used to guide sequence generation, with a reward function that jointly promoted predicted viability and sequence novelty, thereby enabling exploration beyond regions represented in the training data. Comparative analyses showed that fine-tuning alone produces sequences with high predicted viability but remains biased toward the training distribution, whereas reinforcement learining-guided generation reaches more distant regions of sequence space while maintaining high predicted viability. Finally, we propose a candidate selection strategy that integrates predicted viability, sequence novelty, and biophysical properties to prioritize variants for downstream evaluation. This work establishes a framework for the generative exploration of protein sequence space and advances the application of generative protein language models to AAV bioengineering.

Problem

Research questions and friction points this paper is trying to address.

AAV capsid design

protein sequence space

generative modeling

sequence novelty

functional viability

Innovation

Methods, ideas, or system contributions that make the work stand out.

protein language models

reinforcement learning

de novo protein design

AAV capsid engineering

sequence novelty

🔎 Similar Papers

No similar papers found.

Authors to Follow