🤖 AI Summary
To address privacy leakage and system hijacking risks in multi-agent collaboration—arising from prompt injection and context manipulation—this paper systematically integrates foundational information security principles (defense-in-depth, least privilege, and complete mediation) into the full lifecycle design of LLM-based agents. We propose AgentSandbox, a framework that natively embeds these principles within agent protocols via sandboxed execution, fine-grained access control, and context integrity verification. A three-dimensional evaluation across mainstream LLMs measures benign utility, attack utility, and attack success rate. Results show that benign task performance remains high, privacy leakage risk is significantly reduced, and attack success rates drop by 76% on average—without compromising usability or user acceptance. This work establishes a scalable, verifiable, principle-driven paradigm for secure LLM agent architecture.
📝 Abstract
Large Language Model (LLM) agents show considerable promise for automating complex tasks using contextual reasoning; however, interactions involving multiple agents and the system's susceptibility to prompt injection and other forms of context manipulation introduce new vulnerabilities related to privacy leakage and system exploitation. This position paper argues that the well-established design principles in information security, which are commonly referred to as security principles, should be employed when deploying LLM agents at scale. Design principles such as defense-in-depth, least privilege, complete mediation, and psychological acceptability have helped guide the design of mechanisms for securing information systems over the last five decades, and we argue that their explicit and conscientious adoption will help secure agentic systems. To illustrate this approach, we introduce AgentSandbox, a conceptual framework embedding these security principles to provide safeguards throughout an agent's life-cycle. We evaluate with state-of-the-art LLMs along three dimensions: benign utility, attack utility, and attack success rate. AgentSandbox maintains high utility for its intended functions under both benign and adversarial evaluations while substantially mitigating privacy risks. By embedding secure design principles as foundational elements within emerging LLM agent protocols, we aim to promote trustworthy agent ecosystems aligned with user privacy expectations and evolving regulatory requirements.