RTBAS: Defending LLM Agents Against Prompt Injection and Privacy Leakage

📅 2025-02-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the dual security threats—prompt injection attacks and privacy leakage—in tool-based agent systems (TBAS), where language model agents are inherently vulnerable. We propose an automated, user-intervention-free defense framework. First, we adapt information-flow control (IFC) to the TBAS setting—a novel application in this domain. Second, we design a dual-dependency screener leveraging large language model (LLM) discrimination and attention saliency to dynamically assess tool invocation permissions and perform sensitivity analysis. Evaluated on the AgentDojo benchmark, our method achieves 100% success in blocking targeted attacks while incurring only a 2% drop in task utility; it further attains near-oracular accuracy in detecting both explicit and implicit privacy leaks. Our core contribution lies in the first integration of IFC with LLM-based self-assessment mechanisms, striking a practical balance between security assurance and operational efficiency, and substantially reducing reliance on manual verification.

Technology Category

Application Category

📝 Abstract
Tool-Based Agent Systems (TBAS) allow Language Models (LMs) to use external tools for tasks beyond their standalone capabilities, such as searching websites, booking flights, or making financial transactions. However, these tools greatly increase the risks of prompt injection attacks, where malicious content hijacks the LM agent to leak confidential data or trigger harmful actions. Existing defenses (OpenAI GPTs) require user confirmation before every tool call, placing onerous burdens on users. We introduce Robust TBAS (RTBAS), which automatically detects and executes tool calls that preserve integrity and confidentiality, requiring user confirmation only when these safeguards cannot be ensured. RTBAS adapts Information Flow Control to the unique challenges presented by TBAS. We present two novel dependency screeners, using LM-as-a-judge and attention-based saliency, to overcome these challenges. Experimental results on the AgentDojo Prompt Injection benchmark show RTBAS prevents all targeted attacks with only a 2% loss of task utility when under attack, and further tests confirm its ability to obtain near-oracle performance on detecting both subtle and direct privacy leaks.
Problem

Research questions and friction points this paper is trying to address.

Defends against prompt injection attacks
Prevents privacy leakage in LLM agents
Reduces user confirmation burden in TBAS
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated tool call detection
Information Flow Control adaptation
LM-as-a-judge dependency screener
🔎 Similar Papers
No similar papers found.
P
Peter Yong Zhong
Carnegie Mellon University
S
Siyuan Chen
Carnegie Mellon University
R
Ruiqi Wang
Carnegie Mellon University
McKenna McCall
McKenna McCall
Colorado State University
Formal Methods for Security and PrivacyUsable Security
B
Ben L. Titzer
Carnegie Mellon University
Heather Miller
Heather Miller
Carnegie Mellon University
Programming LanguagesDistributed ProgrammingParallel ProgrammingConcurrent Programming