WebChain: A Large-Scale Human-Annotated Dataset of Real-World Web Interaction Traces

📅 2026-03-05

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

The scarcity of large-scale, real-world human web interaction data with high-quality annotations has severely hindered reproducible research on web-based intelligent agents. To address this gap, this work introduces a large open-source dataset comprising 31,725 trajectories (318,000 steps), pioneering a novel data paradigm that aligns visual, structural, and action modalities. We design a scalable human trajectory collection pipeline to cover high-value, complex web tasks. Furthermore, we propose a dual mid-training strategy that decouples spatial grounding from task planning, achieving state-of-the-art performance on our newly curated WebChainBench as well as multiple public GUI benchmarks. This approach significantly enhances model generalization in real-world web environments, providing critical data and methodological foundations for the next generation of scalable web agents.

Technology Category

Application Category

📝 Abstract

We introduce WebChain, the largest open-source dataset of human-annotated trajectories on real-world websites, designed to accelerate reproducible research in web agents. It contains 31,725 trajectories and 318k steps, featuring a core Triple Alignment of visual, structural, and action data to provide rich, multi-modal supervision. The data is collected via a scalable pipeline that ensures coverage of complex, high-value tasks often missed by synthetic methods. Leveraging this dataset, we propose a Dual Mid-Training recipe that decouples spatial grounding from planning, achieving state-of-the-art performance on our proposed WebChainBench and other public GUI benchmarks. Our work provides the data and insights necessary to build and rigorously evaluate the next generation of scalable web agents.

Problem

Research questions and friction points this paper is trying to address.

web agents

human-annotated dataset

web interaction traces

real-world websites

multi-modal supervision

Innovation

Methods, ideas, or system contributions that make the work stand out.

Triple Alignment

Dual Mid-Training

Web Interaction Dataset

Multi-modal Supervision

Web Agents

🔎 Similar Papers

No similar papers found.

Authors to Follow