Benchmarking Living-Screen-Native GUI Agents on Short-Video Platforms

📅 2026-06-03
📈 Citations: 0
Influential: 0
📄 PDF

career value

216K/year
🤖 AI Summary
Existing GUI agents predominantly assume static interfaces, rendering them ill-suited for dynamic, continuously updating environments such as short-video platforms. To address this limitation, this work introduces LivingScreen, a novel benchmark that formally defines the “live-screen native” GUI agent task. It proposes “observation control” as a new capability dimension and establishes a realistic browser-based simulation environment, a three-tier task hierarchy, and evaluation metrics that jointly account for accuracy and information efficiency. Experimental results demonstrate that state-of-the-art models significantly underperform humans on dynamic interfaces, often exhibiting either insufficient or excessive observation behaviors. These findings underscore observation control as a critical deficiency in current GUI agent designs.
📝 Abstract
GUI agents today assume a static screen, where the world is frozen between two actions. However, real interfaces such as short-video applications violate this assumption, as their content keeps playing, and a competent user must decide what to watch and for how long. We formalize this task as Living-Screen-Native GUI agents and introduce LivingScreen, the first benchmark instantiating it on short-video platforms, with a faithful browser-based environment, a three-tier task suite, and metrics that jointly score accuracy and information efficiency. Evaluating extensive frontier models, we find that none reaches the human cost-accuracy performance, and that their dominant failure mode is over- and under-observation, pointing to observation control as a missing capability axis for future GUI agents. All data and code will be available at https://github.com/BITHLP/LivingScreen.
Problem

Research questions and friction points this paper is trying to address.

Living-Screen
GUI agents
short-video platforms
dynamic interfaces
observation control
Innovation

Methods, ideas, or system contributions that make the work stand out.

Living-Screen-Native
GUI agents
dynamic interfaces
observation control
benchmark
🔎 Similar Papers