Directed Evolution of Proteins via Bayesian Optimization in Embedding Space

📅 2025-09-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Directed evolution is bottlenecked by the cost and time required for high-throughput biochemical screening, necessitating intelligent strategies to improve variant selection efficiency. This paper introduces a novel framework integrating pretrained protein language model (pLM) sequence embeddings with Bayesian optimization (BO): pLMs map protein sequences into a low-dimensional semantic space, yielding differentiable, information-rich representations; BO is then deployed in this space, using acquisition functions to iteratively recommend high-potential variants. By circumventing the strong dependence on labeled data inherent in conventional regression-based modeling, our approach significantly increases information gain per experimental round. On multiple protein functional optimization benchmarks, our method identifies variants with higher activity under identical experimental budgets, achieves 37–52% faster convergence, and reduces total screening burden by ~40%, outperforming state-of-the-art regression-driven methods.

Technology Category

Application Category

📝 Abstract
Directed evolution is an iterative laboratory process of designing proteins with improved function by iteratively synthesizing new protein variants and evaluating their desired property with expensive and time-consuming biochemical screening. Machine learning methods can help select informative or promising variants for screening to increase their quality and reduce the amount of necessary screening. In this paper, we present a novel method for machine-learning-assisted directed evolution of proteins which combines Bayesian optimization with informative representation of protein variants extracted from a pre-trained protein language model. We demonstrate that the new representation based on the sequence embeddings significantly improves the performance of Bayesian optimization yielding better results with the same number of conducted screening in total. At the same time, our method outperforms the state-of-the-art machine-learning-assisted directed evolution methods with regression objective.
Problem

Research questions and friction points this paper is trying to address.

Optimizing protein function via directed evolution
Reducing costly biochemical screening in protein design
Improving Bayesian optimization with protein language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian optimization in embedding space
Pre-trained protein language model representation
Improved screening efficiency and performance
🔎 Similar Papers
No similar papers found.
M
Matouš Soldát
Department of Computer Science, FEE, Czech Technical University in Prague
Jiří Kléma
Jiří Kléma
Czech Technical University in Prague
Machine LearningData MiningBioinformatics