KVQ: Boosting Video Quality Assessment via Saliency-guided Local Perception

📅 2025-03-13

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the challenges of modeling regional quality heterogeneity and the scarcity of region-level annotations in video quality assessment (VQA), this paper proposes KVQ—a novel saliency-driven local quality perception framework. KVQ integrates human visual system (HVS)-inspired multi-scale saliency modeling with local texture awareness, employs Fusion-Window Attention to guide attention allocation via saliency maps, and introduces a Local Perception Constraint to enhance sensitivity to localized distortions. Furthermore, it decouples local texture representation from neighborhood dependencies to improve fine-grained distortion discrimination. We also construct LPVQ—the first publicly available VQA dataset with pixel-accurate region-level quality annotations. Extensive experiments demonstrate that KVQ achieves state-of-the-art performance across five mainstream VQA benchmarks. Ablation studies on LPVQ confirm its superior capability in precisely localizing spatially varying distortions. Code and the LPVQ dataset are publicly released.

Technology Category

Application Category

📝 Abstract

Video Quality Assessment (VQA), which intends to predict the perceptual quality of videos, has attracted increasing attention. Due to factors like motion blur or specific distortions, the quality of different regions in a video varies. Recognizing the region-wise local quality within a video is beneficial for assessing global quality and can guide us in adopting fine-grained enhancement or transcoding strategies. Due to the heavy cost of annotating region-wise quality, the lack of ground truth constraints from relevant datasets further complicates the utilization of local perception. Inspired by the Human Visual System (HVS) that links global quality to the local texture of different regions and their visual saliency, we propose a Kaleidoscope Video Quality Assessment (KVQ) framework, which aims to effectively assess both saliency and local texture, thereby facilitating the assessment of global quality. Our framework extracts visual saliency and allocates attention using Fusion-Window Attention (FWA) while incorporating a Local Perception Constraint (LPC) to mitigate the reliance of regional texture perception on neighboring areas. KVQ obtains significant improvements across multiple scenarios on five VQA benchmarks compared to SOTA methods. Furthermore, to assess local perception, we establish a new Local Perception Visual Quality (LPVQ) dataset with region-wise annotations. Experimental results demonstrate the capability of KVQ in perceiving local distortions. KVQ models and the LPVQ dataset will be available at https://github.com/qyp2000/KVQ.

Problem

Research questions and friction points this paper is trying to address.

Assessing video quality via saliency-guided local perception.

Overcoming lack of region-wise quality annotations in datasets.

Improving global quality assessment by focusing on local distortions.

Innovation

Methods, ideas, or system contributions that make the work stand out.

KVQ framework assesses video quality via saliency.

Fusion-Window Attention enhances local texture perception.

LPVQ dataset supports local distortion assessment.

🔎 Similar Papers

Bridging the Gap Between Saliency Prediction and Image Quality Assessment

2024-05-08arXiv.orgCitations: 0

Authors to Follow