🤖 AI Summary
This study addresses the confounding influence of center bias in existing visual attention models, which often leads to an overestimation of behavioral alignment with human eye movement patterns. To mitigate this issue, the authors propose a debiased evaluation metric, the Gaze-Centered Similarity (GCS), and conduct a systematic analysis of scanpaths generated by hard attention classifiers on the Gaze-CIFAR-10 dataset under varying foveal and peripheral field-of-view configurations. Their findings reveal, for the first time, that scanpaths most closely matching the temporal characteristics of human eye movements emerge when both foveal and peripheral information are integrated within a moderate field of view. In contrast, excessively large fields of view encourage shortcut strategies. The GCS metric effectively disentangles genuine behavioral alignment from artifacts induced by center bias, identifying a “sweet spot” in the peripheral field where human-like scanpaths are optimally generated.
📝 Abstract
Human eye movements in visual recognition reflect a balance between foveal sampling and peripheral context. Task-driven hard-attention models for vision are often evaluated by how well their scanpaths match human gaze. However, common scanpath metrics can be strongly confounded by dataset-specific center bias, especially on object-centric datasets. Using Gaze-CIFAR-10, we show that a trivial center-fixation baseline achieves surprisingly strong scanpath scores, approaching many learned policies. This makes standard metrics optimistic and blurs the distinction between genuine behavioral alignment and mere central tendency. We then analyze a hard-attention classifier under constrained vision by sweeping foveal patch size and peripheral context, revealing a peripheral sweet spot: only a narrow range of sensory constraints yields scanpaths that are simultaneously (i) above the center baseline after debiasing and (ii) temporally human-like in movement statistics. To address center bias, we propose GCS (Gaze Consistency Score), a center-debiased composite metric augmented with movement similarity. GCS uncovers a robust sweet spot at medium patch size with both foveal and peripheral vision, that is not obvious from raw scanpath metrics or accuracy alone, and also highlights a"shortcut regime"when the field-of-view becomes too large. We discuss implications for evaluating active perception on object-centric datasets and for designing gaze benchmarks that better separate behavioral alignment from center bias.