SIGN: A Statistically-Informed Gaze Network for Gaze Time Prediction

📅 2025-01-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses visual attention modeling in images, targeting the prediction of group-average fixation durations and the inversion of region-level fixation probability maps. We propose the first hybrid framework integrating statistical modeling with deep learning: a CNN-Transformer architecture jointly designed with an interpretable loss function that incorporates statistical properties of eye-tracking data, enabling inference of individualized scanpath distributions from aggregated fixation durations. Evaluated on AdGaze3500 and COCO-Search18, our method significantly outperforms existing state-of-the-art approaches. The generated fixation probability maps exhibit strong alignment with ground-truth eye movement patterns—demonstrated by a 23.6% reduction in KL divergence and a 4.1% improvement in AUC—validating the model’s accurate characterization of human visual search behavior.

Technology Category

Application Category

📝 Abstract
We propose a first version of SIGN, a Statistically-Informed Gaze Network, to predict aggregate gaze times on images. We develop a foundational statistical model for which we derive a deep learning implementation involving CNNs and Visual Transformers, which enables the prediction of overall gaze times. The model enables us to derive from the aggregate gaze times the underlying gaze pattern as a probability map over all regions in the image, where each region's probability represents the likelihood of being gazed at across all possible scan-paths. We test SIGN's performance on AdGaze3500, a dataset of images of ads with aggregate gaze times, and on COCO-Search18, a dataset with individual-level fixation patterns collected during search. We demonstrate that SIGN (1) improves gaze duration prediction significantly over state-of-the-art deep learning benchmarks on both datasets, and (2) can deliver plausible gaze patterns that correspond to empirical fixation patterns in COCO-Search18. These results suggest that the first version of SIGN holds promise for gaze-time predictions and deserves further development.
Problem

Research questions and friction points this paper is trying to address.

Visual Attention
Image Viewing Prediction
Gaze Pattern Analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

SIGN model
Visual Attention Prediction
Deep Learning Integration
🔎 Similar Papers
No similar papers found.