🤖 AI Summary
To address the challenge of automating biomolecular molecular dynamics (MD) simulation workflows, this paper introduces MDCrow—an intelligent agent powered by large language models (LLMs), specifically GPT-4o and Llama3-405B. MDCrow pioneers a chain-of-thought coordination mechanism that integrates over 40 domain-specific scientific tools, enabling end-to-end autonomous orchestration and robust execution of file preprocessing, simulation parameterization, result analysis, and literature retrieval. Leveraging multi-style prompt engineering and deep integration of specialized tools, it significantly advances automation in scientific computing. Evaluated on 25 diverse MD tasks spanning varying complexity levels, MDCrow achieves high success rates and low performance variance. Notably, it provides the first systematic empirical validation that open-weight LLMs—particularly Llama3-405B—exhibit strong competitiveness in intricate, domain-intensive scientific agent tasks.
📝 Abstract
Molecular dynamics (MD) simulations are essential for understanding biomolecular systems but remain challenging to automate. Recent advances in large language models (LLM) have demonstrated success in automating complex scientific tasks using LLM-based agents. In this paper, we introduce MDCrow, an agentic LLM assistant capable of automating MD workflows. MDCrow uses chain-of-thought over 40 expert-designed tools for handling and processing files, setting up simulations, analyzing the simulation outputs, and retrieving relevant information from literature and databases. We assess MDCrow's performance across 25 tasks of varying required subtasks and difficulty, and we evaluate the agent's robustness to both difficulty and prompt style. exttt{gpt-4o} is able to complete complex tasks with low variance, followed closely by exttt{llama3-405b}, a compelling open-source model. While prompt style does not influence the best models' performance, it has significant effects on smaller models.