Leveraging Large Language Models for enzymatic reaction prediction and characterization

📅 2025-05-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses enzymatic reaction prediction—a core challenge in biochemical modeling—by proposing the first unified multitask framework based on Llama-3.1 (8B/70B), jointly modeling EC number prediction, forward synthesis, and retrosynthesis. The method employs multitask learning, parameter-efficient fine-tuning via LoRA, and structured prompt engineering to enhance generalization under low-resource conditions. It is the first to systematically identify and characterize inherent limitations of large language models (LLMs) in hierarchical EC classification, thereby advancing biochemical knowledge representation paradigms. Experiments demonstrate consistent superiority over single-task baselines across both forward and retrosynthetic tasks; notably, the approach maintains robust performance under few-shot settings. These results substantiate that LLMs can effectively encode, retain, and transfer enzyme-specific biochemical knowledge, offering a scalable foundation for data-scarce enzymology applications.

Technology Category

Application Category

📝 Abstract
Predicting enzymatic reactions is crucial for applications in biocatalysis, metabolic engineering, and drug discovery, yet it remains a complex and resource-intensive task. Large Language Models (LLMs) have recently demonstrated remarkable success in various scientific domains, e.g., through their ability to generalize knowledge, reason over complex structures, and leverage in-context learning strategies. In this study, we systematically evaluate the capability of LLMs, particularly the Llama-3.1 family (8B and 70B), across three core biochemical tasks: Enzyme Commission number prediction, forward synthesis, and retrosynthesis. We compare single-task and multitask learning strategies, employing parameter-efficient fine-tuning via LoRA adapters. Additionally, we assess performance across different data regimes to explore their adaptability in low-data settings. Our results demonstrate that fine-tuned LLMs capture biochemical knowledge, with multitask learning enhancing forward- and retrosynthesis predictions by leveraging shared enzymatic information. We also identify key limitations, for example challenges in hierarchical EC classification schemes, highlighting areas for further improvement in LLM-driven biochemical modeling.
Problem

Research questions and friction points this paper is trying to address.

Predicting enzymatic reactions for biocatalysis and drug discovery
Evaluating LLMs for biochemical tasks like EC number prediction
Assessing multitask learning for enzymatic synthesis predictions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizing LLMs for enzymatic reaction prediction
Employing LoRA adapters for efficient fine-tuning
Multitask learning enhances synthesis predictions
🔎 Similar Papers
No similar papers found.
L
Lorenzo Di Fruscia
Department of Intelligent Systems, Delft University of Technology, Delft 2629 HZ, The Netherlands
Jana Marie Weber
Jana Marie Weber
Assistant professor, TU Delft
(bio)chemical reaction networkssustainabilitygraph machine learningnetwork science