WebChain: A Large-Scale Human-Annotated Dataset of Real-World Web Interaction Traces

📅 2026-03-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The scarcity of large-scale, real-world human web interaction data with high-quality annotations has severely hindered reproducible research on web-based intelligent agents. To address this gap, this work introduces a large open-source dataset comprising 31,725 trajectories (318,000 steps), pioneering a novel data paradigm that aligns visual, structural, and action modalities. We design a scalable human trajectory collection pipeline to cover high-value, complex web tasks. Furthermore, we propose a dual mid-training strategy that decouples spatial grounding from task planning, achieving state-of-the-art performance on our newly curated WebChainBench as well as multiple public GUI benchmarks. This approach significantly enhances model generalization in real-world web environments, providing critical data and methodological foundations for the next generation of scalable web agents.

Technology Category

Application Category

📝 Abstract
We introduce WebChain, the largest open-source dataset of human-annotated trajectories on real-world websites, designed to accelerate reproducible research in web agents. It contains 31,725 trajectories and 318k steps, featuring a core Triple Alignment of visual, structural, and action data to provide rich, multi-modal supervision. The data is collected via a scalable pipeline that ensures coverage of complex, high-value tasks often missed by synthetic methods. Leveraging this dataset, we propose a Dual Mid-Training recipe that decouples spatial grounding from planning, achieving state-of-the-art performance on our proposed WebChainBench and other public GUI benchmarks. Our work provides the data and insights necessary to build and rigorously evaluate the next generation of scalable web agents.
Problem

Research questions and friction points this paper is trying to address.

web agents
human-annotated dataset
web interaction traces
real-world websites
multi-modal supervision
Innovation

Methods, ideas, or system contributions that make the work stand out.

Triple Alignment
Dual Mid-Training
Web Interaction Dataset
Multi-modal Supervision
Web Agents
🔎 Similar Papers
No similar papers found.
S
Sicheng Fan
Fudan University
R
Rui Wan
Fudan University
Y
Yifei Leng
IMean AI
G
Gaoning Liang
IMean AI
Li Ling
Li Ling
KTH - Royal Institute of Technology
computer visiondeep learningroboticsautonomous navigation
Y
Yanyi Shang
IMean AI
D
Dehan Kong
IMean AI