SalFormer360: a transformer-based saliency estimation model for 360-degree videos

📅 2026-02-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited accuracy of saliency estimation in 360-degree videos by proposing a novel Transformer-based architecture. It is the first to adapt the SegFormer encoder to this task, integrating a custom-designed decoder and a viewing-center bias mechanism to effectively model human gaze behavior. The proposed method significantly outperforms state-of-the-art approaches on three benchmark datasets—Sport360, PVS-HM, and VR-EyeTracking—achieving relative improvements of 8.4%, 2.5%, and 18.6% in Pearson Correlation Coefficient, respectively. These results demonstrate its superior capability in capturing attention patterns, thereby providing a more accurate attention prior for viewport prediction and immersive content optimization.

Technology Category

Application Category

📝 Abstract
Saliency estimation has received growing attention in recent years due to its importance in a wide range of applications. In the context of 360-degree video, it has been particularly valuable for tasks such as viewport prediction and immersive content optimization. In this paper, we propose SalFormer360, a novel saliency estimation model for 360-degree videos built on a transformer-based architecture. Our approach is based on the combination of an existing encoder architecture, SegFormer, and a custom decoder. The SegFormer model was originally developed for 2D segmentation tasks, and it has been fine-tuned to adapt it to 360-degree content. To further enhance prediction accuracy in our model, we incorporated Viewing Center Bias to reflect user attention in 360-degree environments. Extensive experiments on the three largest benchmark datasets for saliency estimation demonstrate that SalFormer360 outperforms existing state-of-the-art methods. In terms of Pearson Correlation Coefficient, our model achieves 8.4% higher performance on Sport360, 2.5% on PVS-HM, and 18.6% on VR-EyeTracking compared to previous state-of-the-art.
Problem

Research questions and friction points this paper is trying to address.

saliency estimation
360-degree videos
viewport prediction
immersive content optimization
visual attention
Innovation

Methods, ideas, or system contributions that make the work stand out.

transformer-based architecture
360-degree video saliency
SegFormer adaptation
Viewing Center Bias
saliency estimation
🔎 Similar Papers
No similar papers found.
M
Mahmoud Z. A. Wahba
Department of Information Engineering, University of Padova, Via Gradenigo 6b, 35131, Padua, Italy
F
Francesco Barbato
Department of Information Engineering, University of Padova, Via Gradenigo 6b, 35131, Padua, Italy
S
Sara Baldoni
Department of Information Engineering, University of Padova, Via Gradenigo 6b, 35131, Padua, Italy
Federica Battisti
Federica Battisti
University of Padova
Associate Professor @ Department of Information Engineering