StainFlow: Entity-Stain Tracking and Evidence Linking for Process Rewards in GUI Agents

📅 2026-06-05

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of sparse rewards faced by GUI agents in long-horizon, stochastic digital environments. Existing process reward models struggle to adapt to multi-path tasks and long-range dependencies due to subjective task segmentation and fixed validation windows. To overcome these limitations, the authors propose StainFlow, a process reward model grounded in entity staining dynamics. StainFlow objectively segments tasks by tracking the concentration and state evolution of task-relevant entities and dynamically constructs high-density evidence windows to accurately verify critical steps. By integrating global entity staining trajectories with local evidence linking, the method effectively supports modeling of multiple execution paths. Experiments on AndroidWorld and OGRBench demonstrate that StainFlow improves online reinforcement learning success rates by 3.2% and increases trajectory completion judgment accuracy by 1.8% over baseline approaches.

📝 Abstract

Reinforcement Learning (RL) has become a promising approach for improving GUI Agents in long-horizon, stochastic digital environments, but trajectory-level success feedback is too sparse to provide reliable credit assignment for intermediate exploration steps. To mitigate this issue, recent studies introduce Process Reward Models (PRMs), which provide finer-grained training feedback through global milestone verification or local step-level evaluation. However, these methods still suffer from two level-specific limitations: global milestone decomposition is subjective and singular, making it difficult to accommodate the multiple valid execution paths in real GUI tasks, while fixed local judging windows may miss long-range key evidence or dilute the decision signal with irrelevant frames. Inspired by stain-tracing mechanisms in network flow analysis, we propose StainFlow, an entity-stain-flow process reward model for GUI Agents. To reduce the subjectivity of global partitioning, we introduce the Global Entity Stain Tracking module, which extracts visually verifiable task entities and tracks how their stain concentrations and states evolve along the trajectory, allowing task phases to be objectively separated by changes in the entity evidence flow. To improve the accuracy of local verification, we introduce the Local Stain Evidence Linking module. Centered on the triggering entities of each candidate key node, it retrieves relevant steps based on their stain concentrations and state changes, and dynamically constructs high-density evidence windows for verifying true key nodes. Extensive experiments on AndroidWorld and OGRBench show that StainFlow relatively improves online RL success by 3.2% and trajectory completion judgment accuracy by 1.8%.

Problem

Research questions and friction points this paper is trying to address.

Process Reward Models

Credit Assignment

GUI Agents

Reinforcement Learning

Sparse Rewards

Innovation

Methods, ideas, or system contributions that make the work stand out.

Process Reward Model

Entity-Stain Tracking

Evidence Linking