PI-Hunter: Automated Red-Teaming for Exposing and Localizing Prompt Injections

📅 2026-06-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the vulnerability of large language model (LLM) agents to indirect prompt injection attacks during interaction with external environments—a threat inadequately exposed by existing red-teaming methods due to their inability to trace root causes and propagation pathways. To bridge this gap, we propose PI-Hunter, the first automated agent auditing framework capable of pinpointing attack origins and tracking exploitation paths. PI-Hunter integrates source-aware test generation, feedback-driven evolutionary search, multi-turn interaction simulation, and adversarial environment modeling to proactively induce and reveal malicious instructions embedded in the environment. Empirical evaluations demonstrate that PI-Hunter substantially outperforms current automated red-teaming approaches across diverse agent architectures, benchmark tasks, and defense mechanisms, achieving significantly higher vulnerability exposure rates and attack surface coverage—even against established defenses—thereby transcending the conventional evaluation paradigm centered solely on attack success rates.
📝 Abstract
Large Language Models (LLMs) are rapidly evolving into agentic systems that interact with external tools and environments, introducing new security risks such as indirect prompt injection attacks through untrusted external sources. Existing defenses mainly focus on blocking malicious content at inference time, and current red-teaming methods primarily optimize attack success. As a result, developers have limited visibility into how latent prompt injections emerge and propagate through agents. We propose PI-Hunter, an automated agentic auditing framework for proactive vulnerability exposure in LLM agents. PI-Hunter constructs realistic source-aware test cases and iteratively evolves them through feedback-driven exploration to induce agents to retrieve and reveal latent malicious instructions embedded within external environments. Extensive experiments across multiple benchmarks, agent architectures, attacks, and defenses demonstrate that PI-Hunter substantially improves vulnerability exposure and attack-surface coverage over strong automated red-teaming baselines, while remaining effective under existing prompt injection defenses.
Problem

Research questions and friction points this paper is trying to address.

prompt injection
LLM agents
red-teaming
security vulnerability
external sources
Innovation

Methods, ideas, or system contributions that make the work stand out.

prompt injection
red-teaming
LLM agents
automated auditing
feedback-driven exploration