Multi-Agent Computer Use

📅 2026-05-31

📈 Citations: 0

✨ Influential: 0

career value

253K/year

🤖 AI Summary

Existing single-agent computer use systems struggle with complex, long-horizon tasks due to insufficient task decomposition, limited parallelism, and difficulty in dynamic replanning. This work proposes a Multi-Agent Computer Use (MACU) system that introduces multi-agent collaboration into this domain for the first time. MACU employs a manager agent to model tasks as directed acyclic graphs (DAGs), enabling parallel execution of ready subtasks by specialized sub-agents. The system dynamically replans the DAG structure based on new observations while preserving cross-node information. Experiments demonstrate that MACU outperforms strong single-agent baselines by 3.4–25.5% across multiple desktop and web navigation benchmarks, reduces task completion time by approximately 1.5×, and exhibits superior test-time scalability.

📝 Abstract

Computer use agents (CUAs) today are primarily deployed as single serial agents. This setup is suboptimal for complex long-horizon tasks that benefit from task decomposition, parallel execution, and consistent re-planning based on new information. In this paper, we argue that we should instead move towards evaluating and building multi-agent computer use (MACU) systems. These systems, which emphasize planning and parallel execution, alleviate many of the shortcomings of single-agent CUAs. We propose a general multi-agent setup in which a manager model decomposes computer use tasks as a directed acyclic graph (DAG), encoding relevant dependencies and goals for subagents. At each iteration, the manager dispatches parallel CUA subagents to carry out nodes on the ready frontier of the DAG, and continuously revises the DAG (adding, canceling, or rewriting nodes) as new findings arrive from subagents. This design treats the partially observable environment of computer use as a first class challenge: information that downstream agents may not be able to re-observe are retained and passed forward through the manager and DAG structure. We demonstrate that MACU consistently improves over strong single-agent baselines by $3.4-25.5\%$ on desktop (OSWorld) and web navigation (Online-Mind2Web, WebTailBench, Odysseys) benchmarks, exhibits more favorable test-time scaling, and solves complex long-horizon tasks where single-agent CUAs get stuck. On Odysseys, a long-horizon web navigation benchmark, MACU improves average task completion wall-clock time by ${\sim} 1.5 \times$, demonstrating its efficacy in speeding up traditionally slow CUA pipelines. Our findings highlight that multi-agent coordination is a promising axis for scaling computer use agents to work productively for longer and more effectively. We release all code and interactive visualizations at https://jykoh.com/multi-agent-computer-use.

Problem

Research questions and friction points this paper is trying to address.

Multi-Agent Systems

Computer Use Agents

Long-Horizon Tasks

Task Decomposition

Parallel Execution

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent computer use

task decomposition

directed acyclic graph (DAG)