ViASNet: A Video Ad Saliency Network for Predicting Dynamic Saliency and Viewer Engagement

📅 2026-05-27
📈 Citations: 0
Influential: 0
📄 PDF

career value

234K/year
🤖 AI Summary
This study addresses the challenge of effectively predicting viewers’ dynamic visual attention and engagement in short-form video advertisements. The authors propose ViASNet, a multimodal deep network based on a 3D U-Net architecture, which uniquely integrates audio cues with scene-level semantic information to generate dynamic saliency maps for modeling gaze behavior. Innovatively, they introduce the entropy of saliency maps as a diagnostic metric for viewer engagement. The model’s efficacy is validated through eye-tracking data and ablation studies. Experiments on 151 advertisements demonstrate that the method accurately identifies low-engagement segments, and further achieves automated engagement diagnosis on 15 unseen ads, significantly enhancing the efficiency of ad design and testing.
📝 Abstract
The digital media landscape has seen a pervasive shift toward short-form video advertising on TV, social media and e-commerce platforms. The present study focuses on deep saliency prediction for short-form video advertising. Deep saliency models have been used to generate predictions of human eye fixation patterns with the purpose of enhancing user interaction with digital technology and optimizing its design. For video ads, dynamic saliency maps capture where and when viewers are looking, revealing why video ads are effective, and how their content should be optimized. We develop and test a new deep dynamic saliency prediction model called ViASNet (Video Ad Saliency Network), which has an architecture founded on the 3D U-Net, and accommodates the influence of audio and the semantic meaning of scenes. We assess the model's performance on 151 video ads, each seen by about 20 viewers wile their eye movements were tracked, and explore the critical factors influencing model performance through ablation experiments. We calculate the entropy of the predicted saliency maps frame-by-frame as a diagnostic tool to identify ads and scenes that fail to engage viewers, and illustrate its use on test data of 15 unseen ads. Our study reveals that ad design and testing can be sped up considerably through automated systems built on deep saliency models such as ViASNet.
Problem

Research questions and friction points this paper is trying to address.

video advertising
dynamic saliency prediction
viewer engagement
eye fixation
short-form video
Innovation

Methods, ideas, or system contributions that make the work stand out.

dynamic saliency prediction
video advertising
audio-visual integration
3D U-Net
saliency entropy
🔎 Similar Papers
No similar papers found.