Disclosure Audits for LLM Agents

📅 2025-06-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Quantifying and auditing privacy risks—particularly sensitive data leakage—in large language model (LLM) agents during prolonged, interactive dialogues remains challenging. Method: We propose the first dialogue-oriented privacy auditing framework, featuring (i) a multi-round adversarial probing mechanism (CMPL) that overcomes limitations of single-turn detection; (ii) a quantifiable dialogue privacy risk metric; and (iii) the first open benchmark for dialogue privacy evaluation. Our approach integrates adversarial prompt engineering, multi-turn conversation modeling, quantitative privacy leakage assessment, and cross-modal security configuration testing. Results: Extensive experiments across diverse domains, modalities, and security configurations demonstrate that the framework uncovers deep, previously undetected privacy vulnerabilities—beyond the reach of existing single-turn defenses. It establishes a reproducible, scalable, and rigorous evaluation paradigm for privacy governance of LLM agents.

Technology Category

Application Category

📝 Abstract
Large Language Model agents have begun to appear as personal assistants, customer service bots, and clinical aides. While these applications deliver substantial operational benefits, they also require continuous access to sensitive data, which increases the likelihood of unauthorized disclosures. This study proposes an auditing framework for conversational privacy that quantifies and audits these risks. The proposed Conversational Manipulation for Privacy Leakage (CMPL) framework, is an iterative probing strategy designed to stress-test agents that enforce strict privacy directives. Rather than focusing solely on a single disclosure event, CMPL simulates realistic multi-turn interactions to systematically uncover latent vulnerabilities. Our evaluation on diverse domains, data modalities, and safety configurations demonstrate the auditing framework's ability to reveal privacy risks that are not deterred by existing single-turn defenses. In addition to introducing CMPL as a diagnostic tool, the paper delivers (1) an auditing procedure grounded in quantifiable risk metrics and (2) an open benchmark for evaluation of conversational privacy across agent implementations.
Problem

Research questions and friction points this paper is trying to address.

Auditing privacy risks in LLM agents handling sensitive data
Detecting latent vulnerabilities through multi-turn interaction simulations
Evaluating conversational privacy beyond single-turn defense mechanisms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Iterative probing strategy for privacy audits
Multi-turn interaction simulation for vulnerability detection
Quantifiable risk metrics and open benchmark
🔎 Similar Papers
No similar papers found.