Reinforcement-guided generative protein language models enable de novo design of highly diverse AAV capsids

📅 2026-03-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of adeno-associated virus (AAV) capsid design for gene therapy, which is hindered by the vast sequence space and limited experimental screening capacity. The authors propose a novel approach that integrates reinforcement learning with protein language models, leveraging pretraining, fine-tuning on experimental data, and a reward mechanism to guide sequence generation. This strategy simultaneously overcomes the constraints of training data distribution while balancing functional viability and sequence novelty. The method substantially outperforms baseline approaches relying solely on fine-tuning and introduces a candidate ranking scheme that incorporates multidimensional biophysical properties, thereby significantly enhancing the efficiency of discovering high-potential AAV capsids.

Technology Category

Application Category

📝 Abstract
Adeno-associated viral (AAV) vectors are widely used delivery platforms in gene therapy, and the design of improved capsids is key to expanding their therapeutic potential. A central challenge in AAV bioengineering, as in protein design more broadly, is the vast sequence design space relative to the scale of feasible experimental screening. Machine-guided generative approaches provide a powerful means of navigating this landscape and proposing novel protein sequences that satisfy functional constraints. Here, we develop a generative design framework based on protein language models and reinforcement learning to generate highly novel yet functionally plausible AAV capsids. A pretrained model was fine-tuned on experimentally validated capsid sequences to learn patterns associated with viability. Reinforcement learning was then used to guide sequence generation, with a reward function that jointly promoted predicted viability and sequence novelty, thereby enabling exploration beyond regions represented in the training data. Comparative analyses showed that fine-tuning alone produces sequences with high predicted viability but remains biased toward the training distribution, whereas reinforcement learining-guided generation reaches more distant regions of sequence space while maintaining high predicted viability. Finally, we propose a candidate selection strategy that integrates predicted viability, sequence novelty, and biophysical properties to prioritize variants for downstream evaluation. This work establishes a framework for the generative exploration of protein sequence space and advances the application of generative protein language models to AAV bioengineering.
Problem

Research questions and friction points this paper is trying to address.

AAV capsid design
protein sequence space
generative modeling
sequence novelty
functional viability
Innovation

Methods, ideas, or system contributions that make the work stand out.

protein language models
reinforcement learning
de novo protein design
AAV capsid engineering
sequence novelty
🔎 Similar Papers
No similar papers found.
L
Lucas Ferraz
LASIGE, Faculdade de Ciências da Universidade de Lisboa, Lisboa, Portugal
A
Ana F. Rodrigues
LASIGE, Faculdade de Ciências da Universidade de Lisboa, Lisboa, Portugal
P
Pedro Giesteira Cotovio
LASIGE, Faculdade de Ciências da Universidade de Lisboa, Lisboa, Portugal
M
Mafalda Ventura
iBET – Instituto de Biologia Experimental e Tecnológica, Oeiras, Portugal
G
Gabriela Silva
iBET – Instituto de Biologia Experimental e Tecnológica, Oeiras, Portugal
A
Ana Sofia Coroadinha
iBET – Instituto de Biologia Experimental e Tecnológica, Oeiras, Portugal
Miguel Machuqueiro
Miguel Machuqueiro
Assistant Professor at Faculdade de Ciências, Universidade de Lisboa
Computational BiophysicsMolecular Modeling and SimulationpH Effects on Molecular Structure and
Catia Pesquita
Catia Pesquita
LASIGE, Informática, Faculdade de Ciências, Universidade de Lisboa, Portugal
AI for ScienceKnowledge GraphsBioinformaticsOntology Matching