Learning to Adapt: Self-Improving Web Agent via Cognitive-Aware Exploration

📅 2026-05-29
📈 Citations: 0
Influential: 0
📄 PDF

career value

193K/year
🤖 AI Summary
Existing web agents rely on handcrafted pipelines or costly expert demonstrations, limiting their adaptability in complex, dynamic environments. This work proposes SCALE, a framework featuring three adversarial roles—Selector, Predictor, and Judger—that enable agents to autonomously recognize their cognitive limitations through a self-aware exploration mechanism. By integrating a graph-structured, multi-hop exploration strategy (SCALE-Hop), SCALE continuously expands its cognitive boundary and supports global planning without requiring any expert data, thereby facilitating self-improvement. Evaluated on a multimodal large language model and SCALE-20k—a large-scale dataset of real-world websites—SCALE substantially enhances performance and generalization across diverse web tasks, demonstrating strong scalability and broad applicability.
📝 Abstract
Recent advances in Multimodal Large Language Models (MLLMs) have led to promising progress in web agents. However, existing web agents often rely on handcrafted execution pipelines or expensive expert trajectories, limiting their adaptability to complex, dynamic environments. To address these challenges, we propose SCALE (Self-Cognitive-Aware Learning and Exploration), which leverages three adversarial roles, Selector, Predictor, and Judger to autonomously discover the agent's limitations and expand its cognitive boundaries through environmental exploration. Moreover, we propose SCALE-Hop, a graph exploration strategy that facilitates global planning and helps agents avoid local exploration traps. To further support learning, we construct SCALE-20k, a large-scale dataset collected from 19 real-world websites, containing diverse task types and structured demonstrations generated from SCALE's exploration traces. Experimental results show that our approach significantly improves the performance and generalization of multiple MLLMs in various web environments. Our framework offers a scalable and generalizable solution for building truly autonomous and adaptive web agents.
Problem

Research questions and friction points this paper is trying to address.

web agents
adaptability
dynamic environments
Multimodal Large Language Models
autonomous exploration
Innovation

Methods, ideas, or system contributions that make the work stand out.

self-improving agent
cognitive-aware exploration
adversarial role framework
graph-based exploration
multimodal web automation
🔎 Similar Papers
No similar papers found.