🤖 AI Summary
This work addresses the context degradation problem in long-horizon web agents, which arises from processing the full DOM and accessibility tree at every step, thereby impairing reasoning capabilities over time. To mitigate this, the paper introduces Signal-Driven Observation (SDO), a novel architecture that treats on-demand observation as a core design principle. SDO employs a lightweight signal detector that triggers specialized subroutines in response to salient events—such as URL changes, the appearance of new interactive elements, or action failures—parsing the DOM only when necessary and returning only task-relevant elements along with their selectors. By decoupling observation frequency from action frequency, this approach substantially reduces redundant information processing and effectively alleviates context degradation, offering a new paradigm for building efficient and scalable web agents.
📝 Abstract
Web agents operating over long horizons ingest raw DOM and accessibility trees -- routinely tens of thousands of tokens -- at every action step, causing progressive context degradation that erodes reasoning well before tasks complete. We argue that this coupling of observation frequency to action frequency is an architectural mistake. Drawing on the insight from Recursive Language Models that querying a document outperforms reading it wholesale, we propose Signal-Driven Observation (SDO): a dedicated sub-call reads the full DOM but returns only task-relevant elements and their selectors, and is re-invoked only when a lightweight signal detector fires -- triggered by URL transitions, newly visible interactive elements, action failures, or exogenous browser events. We outline the open problems SDO introduces and call on the community to treat observation compression as a core architectural decision in web agent design.