🤖 AI Summary
To address insufficient spherical distortion modeling and inadequate spatiotemporal attention coupling in 360° video saliency prediction, this paper proposes the first end-to-end saliency prediction model integrating spherical convolution with a generative adversarial network (GAN). Methodologically, we design a spherical convolutional module to accurately capture the geometric structure and dynamic features of omnidirectional imagery, augmented with an embedded spatiotemporal attention mechanism. Notably, we pioneer the incorporation of spherical convolutions into both the GAN generator and discriminator within a unified adversarial training framework, thereby enhancing structural fidelity and fine-grained consistency of predicted saliency maps via adversarial learning. Extensive experiments on multiple public 360° video datasets demonstrate that our approach achieves significant improvements over state-of-the-art methods across standard metrics—including KL divergence, correlation coefficient (CC), and similarity (SIM)—validating its robustness to spherical distortion and superior prediction accuracy.
📝 Abstract
The recent success of immersive applications is pushing the research community to define new approaches to process 360° images and videos and optimize their transmission. Among these, saliency estimation provides a powerful tool that can be used to identify visually relevant areas and, consequently, adapt processing algorithms. Although saliency estimation has been widely investigated for 2D content, very few algorithms have been proposed for 360° saliency estimation. Towards this goal, we introduce Sphere-GAN, a saliency detection model for 360° videos that leverages a Generative Adversarial Network with spherical convolutions. Extensive experiments were conducted using a public 360° video saliency dataset, and the results demonstrate that Sphere-GAN outperforms state-of-the-art models in accurately predicting saliency maps.