🤖 AI Summary
This work identifies a novel, query-agnostic indirect prompt injection (IPI) threat in modern IDE-integrated programming agents: attackers exploit internal prompt leakage to craft transferable malicious tool descriptions that reliably trigger privilege escalation without relying on specific user inputs. Unlike prior IPI studies, we formally model this attack as a constrained white-box optimization problem and propose the first query-free IPI method tailored for coding agents. Our approach iteratively optimizes tool descriptions within a simulated agent environment, leveraging feedback mechanisms to enhance robustness. Experiments demonstrate that our method achieves up to 87% attack success rate across five mainstream programming agents—substantially outperforming baseline approaches. Crucially, the generated malicious tool descriptions exhibit strong cross-system transferability and remain effective in real-world IDE deployments, underscoring their practical security impact.
📝 Abstract
Modern coding agents integrated into IDEs combine powerful tools and system-level actions, exposing a high-stakes attack surface. Existing Indirect Prompt Injection (IPI) studies focus mainly on query-specific behaviors, leading to unstable attacks with lower success rates. We identify a more severe, query-agnostic threat that remains effective across diverse user inputs. This challenge can be overcome by exploiting a common vulnerability: leakage of the agent's internal prompt, which turns the attack into a constrained white-box optimization problem. We present QueryIPI, the first query-agnostic IPI method for coding agents. QueryIPI refines malicious tool descriptions through an iterative, prompt-based process informed by the leaked internal prompt. Experiments on five simulated agents show that QueryIPI achieves up to 87 percent success, outperforming baselines, and the generated malicious descriptions also transfer to real-world systems, highlighting a practical security risk to modern LLM-based coding agents.