🤖 AI Summary
Current LLM-driven GUI agents predominantly operate in a passive, reactive manner, limiting their capability for general-purpose and efficient information acquisition. To address this, we propose the first GUI agent framework endowed with proactive reasoning capabilities. Our approach integrates three core components: (1) fine-grained GUI state understanding, (2) multi-step, context-aware planning, and (3) demand-prediction–triggered execution. This enables cross-application, cross-domain deep information integration and anticipatory task execution. Crucially, the framework transcends conventional passive paradigms by autonomously identifying latent user needs from high-level instructions and initiating targeted information retrieval without explicit step-by-step guidance. Empirical evaluation demonstrates substantial improvements in complex multi-step task completion efficiency (+38.2%) and user satisfaction. The implementation—including source code and demonstration videos—is publicly released.
📝 Abstract
Large language model (LLM)-based agents have demonstrated remarkable capabilities in addressing complex tasks, thereby enabling more advanced information retrieval and supporting deeper, more sophisticated human information-seeking behaviors. However, most existing agents operate in a purely reactive manner, responding passively to user instructions, which significantly constrains their effectiveness and efficiency as general-purpose platforms for information acquisition. To overcome this limitation, this paper proposes AppAgent-Pro, a proactive GUI agent system that actively integrates multi-domain information based on user instructions. This approach enables the system to proactively anticipate users' underlying needs and conduct in-depth multi-domain information mining, thereby facilitating the acquisition of more comprehensive and intelligent information. AppAgent-Pro has the potential to fundamentally redefine information acquisition in daily life, leading to a profound impact on human society. Our code is available at: https://github.com/LaoKuiZe/AppAgent-Pro. Our code is available at: https://github.com/LaoKuiZe/AppAgent-Pro. The demonstration video could be found at: https://www.dropbox.com/scl/fi/hvzqo5vnusg66srydzixo/AppAgent-Pro-demo-video.mp4?rlkey=o2nlfqgq6ihl125mcqg7bpgqu&st=d29vrzii&dl=0.