🤖 AI Summary
This work addresses the challenge faced by large language model (LLM) agents in balancing user privacy policies with task utility when interacting with third-party systems, particularly under adversarial probing attacks. The authors propose POLAR-Bench, a novel benchmark featuring an orthogonal two-axis diagnostic framework that systematically quantifies the privacy–utility trade-off through policy-aware adversarial dialogues, deterministic membership inference, and multidimensional privacy–attack strategy combinations. Evaluation across 7,852 samples spanning ten domains reveals that state-of-the-art models can safeguard over 99% of sensitive attributes, whereas mainstream open-source models (1–30B parameters) exhibit significantly weaker performance, with some leaking more than 50% of sensitive information. These findings expose a critical gap in current LLMs’ capability for privacy alignment.
📝 Abstract
LLM agents increasingly have access to private user data and act on the user's behalf when interacting with third-party systems. The user defines what may and must not be shared, and the agent must robustly follow that intent even when third-party systems behave adversarially. We introduce POLAR-Bench (Policy-aware adversarial Benchmark), in which a trusted model with a privacy policy and a task converses with a third-party model that adversarially probes for both task-relevant and protected attributes. Across 10 domains and 7,852 samples, we score privacy and utility by deterministic set-membership and vary privacy policy dimension and attack strategy along two orthogonal axes, producing a 5 times 5 diagnostic surface per model. Our results reveal a sharp split: current frontier models withhold over 99% of protected attributes, while smaller open-weight models in the 1--30B range, the class users most commonly run as their own trusted agent on-device or via private inference, score notably worse, with the weakest leaking over half. POLAR-Bench thus localizes where each model's intent-following breaks down, providing a foothold for privacy alignment where it matters most.