AgentArmor: Enforcing Program Analysis on Agent Runtime Trace to Defend Against Prompt Injection

📅 2025-08-02

📈 Citations: 0

✨ Influential: 0

career value

137K/year

🤖 AI Summary

Large language model (LLM) agents face security risks from prompt injection attacks during tool invocation. Method: This paper proposes a program-analysis-based defense framework that models the agent’s runtime execution trace as a structured graph intermediate representation—unifying control-flow, data-flow, and program-dependence graphs—and introduces a lightweight type system for static verification of sensitive data flows and trust boundaries. A graph constructor and security metadata registration mechanism enable fine-grained policy enforcement. Contribution/Results: Evaluated on the AgentDojo benchmark, our approach achieves 95.75% true positive rate and only 3.66% false positive rate, significantly outperforming existing defenses. This work pioneers the systematic application of classical program analysis techniques to formal security verification of LLM agents, establishing a new methodological pathway for agent-level safety assurance.

Technology Category

Application Category

📝 Abstract

Large Language Model (LLM) agents offer a powerful new paradigm for solving various problems by combining natural language reasoning with the execution of external tools. However, their dynamic and non-transparent behavior introduces critical security risks, particularly in the presence of prompt injection attacks. In this work, we propose a novel insight that treats the agent runtime traces as structured programs with analyzable semantics. Thus, we present AgentArmor, a program analysis framework that converts agent traces into graph intermediate representation-based structured program dependency representations (e.g., CFG, DFG, and PDG) and enforces security policies via a type system. AgentArmor consists of three key components: (1) a graph constructor that reconstructs the agent's working traces as graph-based intermediate representations with control flow and data flow described within; (2) a property registry that attaches security-relevant metadata of interacted tools & data, and (3) a type system that performs static inference and checking over the intermediate representation. By representing agent behavior as structured programs, AgentArmor enables program analysis over sensitive data flow, trust boundaries, and policy violations. We evaluate AgentArmor on the AgentDojo benchmark, the results show that AgentArmor can achieve 95.75% of TPR, with only 3.66% of FPR. Our results demonstrate AgentArmor's ability to detect prompt injection vulnerabilities and enforce fine-grained security constraints.

Problem

Research questions and friction points this paper is trying to address.

Defends against prompt injection attacks in LLM agents

Converts agent runtime traces into analyzable program representations

Enforces security policies via static type system checking

Innovation

Methods, ideas, or system contributions that make the work stand out.

Converts agent traces into graph-based representations

Enforces security policies via a type system

Detects prompt injection vulnerabilities effectively

🔎 Similar Papers

Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents