Human Scanpath Prediction in Target-Present Visual Search with Semantic-Foveal Bayesian Attention

📅 2025-07-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the problem of predicting eye-movement scanpaths during visual search under object-present conditions. We propose SemBA-FAST, the first framework that jointly integrates deep object detection, probabilistic semantic integration, and biologically grounded foveal modeling. It generates an initial top-down semantic attention map and iteratively refines fixation distributions using a dynamic foveal vision mechanism—without requiring full-sequence eye-tracking priors. This design better aligns with human cognitive principles than prior top-down approaches. Evaluated on COCO-Search18, SemBA-FAST achieves state-of-the-art performance across multiple metrics: predicted scanpaths exhibit high spatiotemporal fidelity to ground-truth human fixations, significantly outperforming mainstream top-down methods, and matching or approaching the performance of strong baselines that rely on complete sequence-level priors.

Technology Category

Application Category

📝 Abstract
In goal-directed visual tasks, human perception is guided by both top-down and bottom-up cues. At the same time, foveal vision plays a crucial role in directing attention efficiently. Modern research on bio-inspired computational attention models has taken advantage of advancements in deep learning by utilizing human scanpath data to achieve new state-of-the-art performance. In this work, we assess the performance of SemBA-FAST, i.e. Semantic-based Bayesian Attention for Foveal Active visual Search Tasks, a top-down framework designed for predicting human visual attention in target-present visual search. SemBA-FAST integrates deep object detection with a probabilistic semantic fusion mechanism to generate attention maps dynamically, leveraging pre-trained detectors and artificial foveation to update top-down knowledge and improve fixation prediction sequentially. We evaluate SemBA-FAST on the COCO-Search18 benchmark dataset, comparing its performance against other scanpath prediction models. Our methodology achieves fixation sequences that closely match human ground-truth scanpaths. Notably, it surpasses baseline and other top-down approaches and competes, in some cases, with scanpath-informed models. These findings provide valuable insights into the capabilities of semantic-foveal probabilistic frameworks for human-like attention modelling, with implications for real-time cognitive computing and robotics.
Problem

Research questions and friction points this paper is trying to address.

Predict human visual attention in target-present search
Integrate deep object detection with semantic fusion
Improve fixation prediction using foveal vision
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates deep object detection with probabilistic fusion
Uses artificial foveation to update top-down knowledge
Leverages pre-trained detectors for dynamic attention maps
🔎 Similar Papers
No similar papers found.