DECEPTICON: How Dark Patterns Manipulate Web Agents

📅 2025-12-28

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work exposes the systemic manipulation risk posed by deceptive UIs (i.e., dark patterns) to web-based AI agents: stronger models are paradoxically more susceptible, and mainstream defenses—including prompt engineering and guardrail models—prove largely ineffective against such semantic-level attacks. To address this gap, we introduce DECEPTICON, the first benchmark for evaluating dark pattern robustness in web agents. It comprises 700 real-world web navigation tasks, enabling controlled dark pattern injection and quantitative assessment of instruction following. Through multi-agent comparative evaluation and human baseline analysis, we find dark patterns successfully induce erroneous actions in over 70% of tasks—substantially exceeding the human error rate of 31%. Moreover, attack success rates increase with model scale and reasoning intensity. This study provides the first empirical quantification of dark pattern efficacy against AI agents, establishing a novel evaluation paradigm and foundational evidence for web agent security.

Technology Category

Application Category

📝 Abstract

Deceptive UI designs, widely instantiated across the web and commonly known as dark patterns, manipulate users into performing actions misaligned with their goals. In this paper, we show that dark patterns are highly effective in steering agent trajectories, posing a significant risk to agent robustness. To quantify this risk, we introduce DECEPTICON, an environment for testing individual dark patterns in isolation. DECEPTICON includes 700 web navigation tasks with dark patterns -- 600 generated tasks and 100 real-world tasks, designed to measure instruction-following success and dark pattern effectiveness. Across state-of-the-art agents, we find dark patterns successfully steer agent trajectories towards malicious outcomes in over 70% of tested generated and real-world tasks -- compared to a human average of 31%. Moreover, we find that dark pattern effectiveness correlates positively with model size and test-time reasoning, making larger, more capable models more susceptible. Leading countermeasures against adversarial attacks, including in-context prompting and guardrail models, fail to consistently reduce the success rate of dark pattern interventions. Our findings reveal dark patterns as a latent and unmitigated risk to web agents, highlighting the urgent need for robust defenses against manipulative designs.

Problem

Research questions and friction points this paper is trying to address.

Dark patterns manipulate web agents into harmful actions.

DECEPTICON tests dark patterns' impact on agent robustness.

Larger AI models are more vulnerable to dark patterns.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces DECEPTICON environment for testing dark patterns

Uses 700 web tasks to measure agent vulnerability

Finds larger models more susceptible to dark patterns

🔎 Similar Papers

No similar papers found.

Authors to Follow