PSPA-Bench: A Personalized Benchmark for Smartphone GUI Agent

📅 2026-03-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of effective modeling and evaluation of user-personalized behaviors in existing smartphone GUI agent benchmarks. To bridge this gap, the authors introduce the first benchmark tailored for personalized scenarios, encompassing 10 categories of everyday tasks, 22 applications, and 12,855 fine-grained instructions aligned with real user behavior. They further propose a structure-aware process evaluation framework that integrates user behavior analysis, semantic alignment between instructions and interface elements, and multidimensional testing. Systematic evaluation of 11 state-of-the-art agents reveals significant limitations in handling personalized tasks. The study identifies key directions for improvement: reasoning-driven architectures, enhanced perception capabilities, and the incorporation of reflection and long-term memory mechanisms.
📝 Abstract
Smartphone GUI agents execute tasks by operating directly on app interfaces, offering a path to broad capability without deep system integration. However, real-world smartphone use is highly personalized: users adopt diverse workflows and preferences, challenging agents to deliver customized assistance rather than generic solutions. Existing GUI agent benchmarks cannot adequately capture this personalization dimension due to sparse user-specific data and the lack of fine-grained evaluation metrics. To address this gap, we present PSPA-Bench, the benchmark dedicated to evaluating personalization in smartphone GUI agents. PSPA-Bench comprises over 12,855 personalized instructions aligned with real-world user behaviors across 10 representative daily-use scenarios and 22 mobile apps, and introduces a structure-aware process evaluation method that measures agents' personalized capabilities at a fine-grained level. Through PSPA-Bench, we benchmark 11 state-of-the-art GUI agents. Results reveal that current methods perform poorly under personalized settings, with even the strongest agent achieving limited success. Our analysis further highlights three directions for advancing personalized GUI agents: (1) reasoning-oriented models consistently outperform general LLMs, (2) perception remains a simple yet critical capability, and (3) reflection and long-term memory mechanisms are key to improving adaptation. Together, these findings establish PSPA-Bench as a foundation for systematic study and future progress in personalized GUI agents.
Problem

Research questions and friction points this paper is trying to address.

personalization
smartphone GUI agent
benchmark
user-specific behavior
evaluation metrics
Innovation

Methods, ideas, or system contributions that make the work stand out.

personalized GUI agent
benchmark
structure-aware evaluation
smartphone automation
user behavior modeling
🔎 Similar Papers
No similar papers found.
H
Hongyi Nie
Northwestern Polytechnical University
X
Xunyuan Liu
Tsinghua University
Y
Yudong Bai
Peking University
Yaqing Wang
Yaqing Wang
Beijing Institute of Mathematical Sciences and Applications (BIMSA)
Machine LearningFew-shot LearningMeta LearningIn-context LearningCold-start
Y
Yang Liu
Northwestern Polytechnical University
Quanming Yao
Quanming Yao
Associate Professor, EE Department, Tsinghua University
Machine Learning
Z
Zhen Wang
Northwestern Polytechnical University