🤖 AI Summary
Existing computational user models struggle to simulate human visual sampling and decision-making under time pressure in pixel-level dynamic interactive environments. This work proposes CR-Eyes, a reinforcement learning model grounded in the computational rationality framework, which treats eye movements as goal-directed actions. CR-Eyes jointly learns “where to look” and “how to act” in an end-to-end manner within Atari games, establishing—for the first time—a unified closed-loop model of visual sampling and action selection under perceptual-cognitive constraints. Experimental results demonstrate that the model closely aligns with human behavior in both task performance and aggregate saliency patterns, while also revealing systematic differences in scanpaths. These findings offer a theory-driven paradigm for the design of interactive systems.
📝 Abstract
Designing mobile and interactive technologies requires understanding how users sample dynamic environments to acquire information and make decisions under time pressure. However, existing computational user models either rely on hand-crafted task representations or are limited to static or non-interactive visual inputs, restricting their applicability to realistic, pixel-based environments. We present CR-Eyes, a computationally rational model that simulates visual sampling and gameplay behavior in Atari games. Trained via reinforcement learning, CR-Eyes operates under perceptual and cognitive constraints and jointly learns where to look and how to act in a time-sensitive setting. By explicitly closing the perception-action loop, the model treats eye movements as goal-directed actions rather than as isolated saliency predictions. Our evaluation shows strong alignment with human data in task performance and aggregate saliency patterns, while also revealing systematic differences in scanpaths. CR-Eyes is a step toward scalable, theory-grounded user models that support design and evaluation of interactive systems.