Online Skill Learning for Web Agents via State-Grounded Dynamic Retrieval

📅 2026-06-02
📈 Citations: 0
Influential: 0
📄 PDF

career value

183K/year
🤖 AI Summary
Existing online skill learning methods for web automation statically reuse skills derived solely from the initial instruction, failing to adapt to dynamically evolving webpage states during execution and thus suffering from insufficient skill coverage. To address this limitation, this work proposes State-Grounded Dynamic Retrieval (SGDR), a novel approach that extracts reusable sub-processes from historical trajectories via a sliding window and leverages a dual-modal text-code representation. At each execution step, SGDR dynamically retrieves the most suitable skill by jointly embedding the current webpage state and the task objective, enabling fine-grained, context-aware, step-level skill invocation. SGDR introduces, for the first time, a dynamic retrieval mechanism aligned with execution states, transcending the conventional task-level static reuse paradigm. Evaluated on five domains in WebArena, SGDR achieves average success rates of 37.5% with GPT-4.1 and 24.3% with Qwen3-4B, yielding relative improvements of 10.6% and 10.0% over the strongest baseline, respectively.
📝 Abstract
Language agents increasingly rely on reusable skills to improve multi-step web automation across related tasks. A growing line of work studies online skill learning, where agents continually induce skills from previous task trajectories and reuse them in future tasks on the fly. However, existing methods mainly reuse skills at the task-level: a fixed set of skills is retrieved based on the initial task instruction and then held fixed throughout execution. This static strategy is misaligned with web execution, where the appropriate next action depends not only on the task goal but also on the current webpage state, which often transitions into situations that the initial skills fail to cover. To address this gap, we propose State-Grounded Dynamic Retrieval (SGDR), an online skill learning method that enables stepwise skill reuse for web agents. SGDR consists of three components: a sliding-window extraction process that turns completed trajectories into reusable sub-procedures invokable at intermediate execution states, a dual text-code representation that connects skill retrieval with executable action, and a state-grounded dynamic retrieval mechanism that matches skills to both the task goal and the current webpage state. Experiments on WebArena across five domains show that SGDR consistently outperforms strong baselines, achieving average success rates of 37.5% with GPT-4.1 and 24.3% with Qwen3-4B, corresponding to relative gains of 10.6% and 10.0% over the strongest baseline, respectively. The code is available at https://github.com/plusnli/skill-dynamic-retrieval.
Problem

Research questions and friction points this paper is trying to address.

online skill learning
web automation
dynamic retrieval
state-grounded
skill reuse
Innovation

Methods, ideas, or system contributions that make the work stand out.

online skill learning
state-grounded retrieval
dynamic skill reuse
web automation
language agents