🤖 AI Summary
This study systematically uncovers a novel attack surface in connected medical AI agents operating in open-tool-calling scenarios. Specifically, it addresses their vulnerability to adversarial prompt injection attacks when interfacing with the Internet via web-browsing tools. To this end, we propose: (1) a realistic web-interaction-based adversarial prompt injection methodology; (2) a cross-model vulnerability assessment framework; and (3) a tool-augmented behavioral auditing mechanism. We identify four previously undocumented attack patterns—information pollution, recommendation hijacking, privacy exfiltration, and system takeover. Empirical evaluation demonstrates that mainstream LLM-powered medical agents—including those driven by DeepSeek-R1—are consistently vulnerable, achieving an average attack success rate exceeding 85%. Our work establishes the first empirically grounded benchmark and reproducible risk taxonomy for medical AI security alignment, red-teaming, and trustworthy tool-calling design.
📝 Abstract
Large language models (LLMs)-powered AI agents exhibit a high level of autonomy in addressing medical and healthcare challenges. With the ability to access various tools, they can operate within an open-ended action space. However, with the increase in autonomy and ability, unforeseen risks also arise. In this work, we investigated one particular risk, i.e., cyber attack vulnerability of medical AI agents, as agents have access to the Internet through web browsing tools. We revealed that through adversarial prompts embedded on webpages, cyberattackers can: i) inject false information into the agent's response; ii) they can force the agent to manipulate recommendation (e.g., healthcare products and services); iii) the attacker can also steal historical conversations between the user and agent, resulting in the leak of sensitive/private medical information; iv) furthermore, the targeted agent can also cause a computer system hijack by returning a malicious URL in its response. Different backbone LLMs were examined, and we found such cyber attacks can succeed in agents powered by most mainstream LLMs, with the reasoning models such as DeepSeek-R1 being the most vulnerable.