🤖 AI Summary
Existing biomedical agents face challenges such as context inflation, tool ambiguity, and inefficient execution due to their reliance on flat tool descriptions when handling heterogeneous bioinformatics tools. This work proposes BioManus, a novel system that integrates the Model Context Protocol (MCP) native architecture with graph-based planning. BioManus employs a BioinfoMCP compiler to standardize diverse tools into MCP services and constructs a typed heterogeneous graph. During inference, it retrieves task-specific subgraphs and generates operation-level workflow skeletons, effectively decoupling planning complexity from tool scale. This approach substantially reduces context requirements while enhancing planning stability and execution accuracy. Evaluated on BioAgentBench and LAB-Bench, BioManus significantly outperforms state-of-the-art baselines in workflow validity, execution accuracy, and context efficiency.
📝 Abstract
Biomedical agents promise to automate complex biological workflows, yet current systems face two fundamental bottlenecks: bioinformatics tools are highly heterogeneous in interfaces and execution environments, while agent planning still relies on flat prompt-retrieved tool descriptions. As biomedical software ecosystems grow, this coupling between tool coverage and context size leads to tool confusion, unstable planning, and inefficient execution. We introduce BioManus, an MCP-native biomedical agent built on graph-scaffolded planning over structured biological capabilities. BioManus first introduces the BioinfoMCP Compiler, which converts heterogeneous bioinformatics software into standardized MCP servers, yielding a large executable MCP ecosystem. It then organizes this ecosystem as a typed heterogeneous MCP graph over tools, operations, datatypes, and workflow stages. At inference time, BioManus retrieves compact task-specific subgraphs, synthesizes operation-level workflow scaffolds. This design decouples planning complexity from raw tool inventory size, achieving a context compression ratio of Theta(N / (h * m_bar)) under high-recall retrieval, where N is the total tool count, h is the workflow horizon, and m_bar (much smaller than N) is the average number of candidate tools per operation. Experiments on BioAgentBench and LAB-Bench show that BioManus improves execution accuracy, workflow validity, and context efficiency over advanced biomedical agent baselines. This work suggests a paradigm shift: scalable biomedical reasoning requires structured executable capability graphs rather than increasingly larger prompt-level tool retrieval.