🤖 AI Summary
AI-driven, non-deterministic control flow in Model Context Protocol (MCP) servers introduces novel risks to sustainability, security, and maintainability. Method: We conduct the first large-scale empirical assessment of 1,899 open-source MCP servers, proposing a hybrid evaluation framework integrating general-purpose static analysis with a custom-built MCP-specific scanner. Contribution/Results: Our analysis identifies eight novel vulnerability classes—five unique to MCP—and introduces “MCP tool poisoning” as a previously unrecognized attack surface. Empirical findings reveal that 7.2% of servers harbor generic vulnerabilities, 5.5% suffer from tool poisoning, 66% exhibit code smells, and 14.4% reproduce ten known defect patterns. The study empirically validates the necessity of MCP-specific detection techniques and provides both methodological foundations and empirical evidence for building robust, trustworthy MCP ecosystems.
📝 Abstract
Although Foundation Models (FMs), such as GPT-4, are increasingly used in domains like finance and software engineering, reliance on textual interfaces limits these models' real-world interaction. To address this, FM providers introduced tool calling-triggering a proliferation of frameworks with distinct tool interfaces. In late 2024, Anthropic introduced the Model Context Protocol (MCP) to standardize this tool ecosystem, which has become the de facto standard with over eight million weekly SDK downloads. Despite its adoption, MCP's AI-driven, non-deterministic control flow introduces new risks to sustainability, security, and maintainability, warranting closer examination. Towards this end, we present the first large-scale empirical study of MCP. Using state-of-the-art health metrics and a hybrid analysis pipeline, combining a general-purpose static analysis tool with an MCP-specific scanner, we evaluate 1,899 open-source MCP servers to assess their health, security, and maintainability. Despite MCP servers demonstrating strong health metrics, we identify eight distinct vulnerabilities-only three overlapping with traditional software vulnerabilities. Additionally, 7.2% of servers contain general vulnerabilities and 5.5% exhibit MCP-specific tool poisoning. Regarding maintainability, while 66% exhibit code smells, 14.4% contain ten bug patterns overlapping prior research. These findings highlight the need for MCP-specific vulnerability detection techniques while reaffirming the value of traditional analysis and refactoring practices.