π€ AI Summary
This work identifies a novel indirect prompt injection (IPI) attack surface in large language model (LLM)-based web navigation agents that rely on accessibility tree parsing of HTML. Attackers can stealthily inject triggers via maliciously crafted HTML elements across cross-site contexts, enabling behavioral hijackingβe.g., credential exfiltration or forced clicks. We propose the first general-purpose IPI attack paradigm targeting accessibility trees and introduce an efficient adversarial HTML trigger generation method combining Greedy Coordinate Gradient optimization with the BrowserGym framework, evaluated on Llama-3.1. Experiments demonstrate high success rates for both goal-directed and generalized attacks on real-world websites. Our code and interactive demo system are publicly released.
π Abstract
This work demonstrates that LLM-based web navigation agents offer powerful automation capabilities but are vulnerable to Indirect Prompt Injection (IPI) attacks. We show that adversaries can embed universal adversarial triggers in webpage HTML to hijack agent behavior that utilizes the accessibility tree to parse HTML, causing unintended or malicious actions. Using the Greedy Coordinate Gradient (GCG) algorithm and a Browser Gym agent powered by Llama-3.1, our system demonstrates high success rates across real websites in both targeted and general attacks, including login credential exfiltration and forced ad clicks. Our empirical results highlight critical security risks and the need for stronger defenses as LLM-driven autonomous web agents become more widely adopted. The system software (https://github.com/sej2020/manipulating-web-agents) is released under the MIT License, with an accompanying publicly available demo website (http://lethaiq.github.io/attack-web-llm-agent).