🤖 AI Summary
This work addresses the challenge that existing GUI agents often suffer from distribution shifts when encountering non-stationary interface distributions—such as those arising from new domains or varying screen resolutions—leading to degraded generalization and continual learning capabilities. To mitigate this, the authors propose a reinforcement fine-tuning (RFT)-based continual learning framework that introduces “grounded certainty” to guide advantage estimation, designs an adaptive advantage weighting mechanism to suppress noise-induced interference, and incorporates dynamic policy pruning to expand exploration breadth. This integrated approach effectively alleviates issues of policy overconfidence and exploration collapse. Experimental results demonstrate that the proposed method significantly outperforms current baselines across diverse GUI environments, achieving more stable and efficient continual learning performance.
📝 Abstract
Graphical User Interfaces (GUIs) serve as the dominant medium for human-computer interaction, yet building GUI agents that generalize across the vast diversity of real-world interface environments, with the same flexibility and robustness that humans naturally exhibit, remains unsolved. Notably, GUI data are inherently non-stationary: the continual emergence of previously unseen interface instances (e.g., novel domains and resolutions) induces persistent distribution shifts, significantly impeding the continual learning of existing GUI agents. Reinforcement fine-tuning (RFT) has attracted considerable attention as a promising approach. Nevertheless, RFT exhibits pronounced instability in its grounding capability, manifested as sharp reward discontinuities and high-variance oscillations. The imbalanced distribution of rollout outcomes introduces substantial noise into advantage estimation, leading to policy overconfidence. The fixed clipping bound suppresses the increase in policy probabilities needed to adapt to new distributions, leading to a collapse in exploration capacity. To address these challenges, we propose GUI-AC, a method that enhances the continual learning capability of GUI agents. GUI-AC introduces grounding certainty to support two core mechanisms: (i) Adaptive Advantage, which down-weights noisy advantage estimates to prevent policy overconfidence; and (ii) Dynamic Clipping, which relaxes the clipping bound to encourage exploration range. Extensive experiments show that these mechanisms jointly improve performance, enabling our method to surpass state-of-the-art baselines. Code is available anonymously at https://anonymous.4open.science/r/GUI-AC.