Branch-and-Browse: Efficient and Controllable Web Exploration with Tree-Structured Reasoning and Action Memory

📅 2025-10-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM-based web agents struggle with multi-step, goal-directed tasks—such as information retrieval, report generation, and online transactions—in open-web environments due to insufficient reasoning depth, ineffective backtracking, and low computational efficiency. To address these challenges, we propose a Tree-based Reasoning-and-Action (TRA) architecture that integrates subtask decomposition, context-and-action memory, webpage state replay, and background-reasoning-guided exploration. This enables fine-grained, backtrackable multi-step reasoning and cross-session knowledge reuse. Evaluated on the WebArena benchmark, our approach achieves a 35.8% task success rate and reduces execution time by up to 40.4%, significantly outperforming prior methods. Our core contributions are threefold: (1) the first integration of structured tree search into web agents; (2) the introduction of shared action memory for persistent, reusable interaction history; and (3) a unified framework that jointly optimizes reasoning depth, controllability, and execution efficiency.

Technology Category

Application Category

📝 Abstract
Autonomous web agents powered by large language models (LLMs) show strong potential for performing goal-oriented tasks such as information retrieval, report generation, and online transactions. These agents mark a key step toward practical embodied reasoning in open web environments. However, existing approaches remain limited in reasoning depth and efficiency: vanilla linear methods fail at multi-step reasoning and lack effective backtracking, while other search strategies are coarse-grained and computationally costly. We introduce Branch-and-Browse, a fine-grained web agent framework that unifies structured reasoning-acting, contextual memory, and efficient execution. It (i) employs explicit subtask management with tree-structured exploration for controllable multi-branch reasoning, (ii) bootstraps exploration through efficient web state replay with background reasoning, and (iii) leverages a page action memory to share explored actions within and across sessions. On the WebArena benchmark, Branch-and-Browse achieves a task success rate of 35.8% and reduces execution time by up to 40.4% relative to state-of-the-art methods. These results demonstrate that Branch-and-Browse is a reliable and efficient framework for LLM-based web agents.
Problem

Research questions and friction points this paper is trying to address.

Enhancing reasoning depth and efficiency for web agents
Enabling controllable multi-branch reasoning with tree structures
Reducing computational costs through action memory sharing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Tree-structured exploration for controllable multi-branch reasoning
Efficient web state replay with background reasoning
Page action memory sharing across sessions
🔎 Similar Papers
No similar papers found.