Boosting In-Silicon Directed Evolution with Fine-Tuned Protein Language Model and Tree Search

📅 2025-11-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current silicon-based directed evolution algorithms fail to effectively leverage evolutionary priors encoded in protein language models (PLMs). To address this, we propose PLM-MCTS: a framework that first fine-tunes a pre-trained PLM on homologous sequences via masked language modeling, then integrates test-time inference with Monte Carlo Tree Search (MCTS) to enable interpretable, guided exploration of evolutionary trajectories. This work represents the first systematic incorporation of large-scale PLMs into the directed evolution pipeline, enabling efficient compression of sequence space within a limited number of mutational steps. On multiple protein family benchmarks, PLM-MCTS significantly outperforms state-of-the-art methods in functional sequence discovery, demonstrating that evolutionary patterns implicitly learned by PLMs provide effective, data-driven guidance for computational protein evolution.

Technology Category

Application Category

📝 Abstract
Protein evolution through amino acid sequence mutations is a cornerstone of life sciences. While current in-silicon directed evolution algorithms focus on designing search strategies, they overlook how to utilize the transformative protein language models, which encode rich evolutionary patterns, to guide search. To bridge this gap, we propose AlphaDE, a novel framework to evolve protein sequences by harnessing the innovative paradigms of large language models. First, AlphaDE fine-tunes pretrained protein language models using masked language modeling on homologous protein sequences to activate the evolutionary plausibility for the interested protein class. Second, AlphaDE introduces test-time inference based on Monte Carlo tree search, which effectively evolves proteins with evolutionary guidance from the fine-tuned protein language model. Extensive benchmark experiments show that AlphaDE remarkably outperforms previous state-of-the-art methods even with few-shot fine-tuning. An interesting case study further shows that AlphaDE supports condensing the protein sequence space through computational evolution.
Problem

Research questions and friction points this paper is trying to address.

Utilizing protein language models to guide evolutionary search strategies
Evolving protein sequences with evolutionary plausibility and guidance
Condensing protein sequence space through computational directed evolution
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tunes protein language models using masked language modeling
Introduces Monte Carlo tree search for evolutionary guidance
Condenses protein sequence space through computational evolution
🔎 Similar Papers
No similar papers found.
Y
Yaodong Yang
Department of Computer Science and Engineering, The Chinese University of Hong Kong
Y
Yang Wang
Hangzhou Institute of Medicine, Chinese Academy of Sciences
J
Jinpeng Li
Department of Computer Science and Engineering, The Chinese University of Hong Kong
Pei Guo
Pei Guo
Soochow University
LLMsNatural Language Generation
Da Han
Da Han
Professor, Shanghai Jiao Tong University, Hangzhou Institute of Medicine (HIM), CAS
DNA nanotechnologyBiosensorsNanofabrications
G
Guangyong Chen
Hangzhou Institute of Medicine, Chinese Academy of Sciences
P
P. Heng
Department of Computer Science and Engineering, The Chinese University of Hong Kong