๐ค AI Summary
Existing no-reference image quality assessment (NR-IQA) methods often neglect local manifold structures, leading to insufficient discriminability for challenging distortions. To address this, we propose a contrastive learning framework that explicitly preserves local manifold geometry. First, we introduce non-salient regions from the same image as intra-image negative samples to enhance local discriminability. Second, we design a saliency-guided dual-branch mutual learning mechanism to adaptively emphasize critical visual regions. Third, we integrate multi-scale cropping sampling with a local manifold-constrained contrastive loss. Extensive experiments on seven benchmark datasets demonstrate state-of-the-art performance: PLCC scores of 0.942 on TID2013 and 0.914 on LIVECโsurpassing all prior methods. Crucially, our approach significantly improves perceptual modeling of structurally distorted and noisy images, validating its effectiveness for difficult distortion cases.
๐ Abstract
Contrastive learning has considerably advanced the field of Image Quality Assessment (IQA), emerging as a widely adopted technique. The core mechanism of contrastive learning involves minimizing the distance between quality-similar (positive) examples while maximizing the distance between quality-dissimilar (negative) examples. Despite its successes, current contrastive learning methods often neglect the importance of preserving the local manifold structure. This oversight can result in a high degree of similarity among hard examples within the feature space, thereby impeding effective differentiation and assessment. To address this issue, we propose an innovative framework that integrates local manifold learning with contrastive learning for No-Reference Image Quality Assessment (NR-IQA). Our method begins by sampling multiple crops from a given image, identifying the most visually salient crop. This crop is then used to cluster other crops from the same image as the positive class, while crops from different images are treated as negative classes to increase inter-class distance. Uniquely, our approach also considers non-saliency crops from the same image as intra-class negative classes to preserve their distinctiveness. Additionally, we employ a mutual learning framework, which further enhances the model's ability to adaptively learn and identify visual saliency regions. Our approach demonstrates a better performance compared to state-of-the-art methods in 7 standard datasets, achieving PLCC values of 0.942 (compared to 0.908 in TID2013) and 0.914 (compared to 0.894 in LIVEC).