🤖 AI Summary
This work addresses the vulnerability of large language model (LLM) agents to indirect prompt injection attacks during interaction with external environments—a threat inadequately exposed by existing red-teaming methods due to their inability to trace root causes and propagation pathways. To bridge this gap, we propose PI-Hunter, the first automated agent auditing framework capable of pinpointing attack origins and tracking exploitation paths. PI-Hunter integrates source-aware test generation, feedback-driven evolutionary search, multi-turn interaction simulation, and adversarial environment modeling to proactively induce and reveal malicious instructions embedded in the environment. Empirical evaluations demonstrate that PI-Hunter substantially outperforms current automated red-teaming approaches across diverse agent architectures, benchmark tasks, and defense mechanisms, achieving significantly higher vulnerability exposure rates and attack surface coverage—even against established defenses—thereby transcending the conventional evaluation paradigm centered solely on attack success rates.
📝 Abstract
Large Language Models (LLMs) are rapidly evolving into agentic systems that interact with external tools and environments, introducing new security risks such as indirect prompt injection attacks through untrusted external sources. Existing defenses mainly focus on blocking malicious content at inference time, and current red-teaming methods primarily optimize attack success. As a result, developers have limited visibility into how latent prompt injections emerge and propagate through agents. We propose PI-Hunter, an automated agentic auditing framework for proactive vulnerability exposure in LLM agents. PI-Hunter constructs realistic source-aware test cases and iteratively evolves them through feedback-driven exploration to induce agents to retrieve and reveal latent malicious instructions embedded within external environments. Extensive experiments across multiple benchmarks, agent architectures, attacks, and defenses demonstrate that PI-Hunter substantially improves vulnerability exposure and attack-surface coverage over strong automated red-teaming baselines, while remaining effective under existing prompt injection defenses.