🤖 AI Summary
Evaluating novel view synthesis (NVS) image quality is challenging when pixel-aligned ground-truth references are unavailable—rendering full-reference (FR) methods inapplicable and no-reference (NR) methods poorly generalizable. To address this, we propose NAR-IQA, the first quality assessment framework tailored for non-aligned references. Our method introduces three key innovations: (1) a contrastive learning-based feature extractor built upon LoRA-finetuned DINOv2, enhancing cross-view semantic consistency modeling; (2) a TROI-driven synthetic distortion generation strategy to improve robustness against realistic NV distortions; and (3) end-to-end training via multi-source IQA supervision signals. Experiments demonstrate that NAR-IQA significantly outperforms state-of-the-art FR-IQA, NR-IQA, and prior NAR-IQA approaches under both aligned and non-aligned settings, achieving strong correlation with human judgments (SROCC > 0.92). The framework exhibits superior practicality and generalization capability across diverse NVS scenarios.
📝 Abstract
Evaluating the perceptual quality of Novel View Synthesis (NVS) images remains a key challenge, particularly in the absence of pixel-aligned ground truth references. Full-Reference Image Quality Assessment (FR-IQA) methods fail under misalignment, while No-Reference (NR-IQA) methods struggle with generalization. In this work, we introduce a Non-Aligned Reference (NAR-IQA) framework tailored for NVS, where it is assumed that the reference view shares partial scene content but lacks pixel-level alignment. We constructed a large-scale image dataset containing synthetic distortions targeting Temporal Regions of Interest (TROI) to train our NAR-IQA model. Our model is built on a contrastive learning framework that incorporates LoRA-enhanced DINOv2 embeddings and is guided by supervision from existing IQA methods. We train exclusively on synthetically generated distortions, deliberately avoiding overfitting to specific real NVS samples and thereby enhancing the model's generalization capability. Our model outperforms state-of-the-art FR-IQA, NR-IQA, and NAR-IQA methods, achieving robust performance on both aligned and non-aligned references. We also conducted a novel user study to gather data on human preferences when viewing non-aligned references in NVS. We find strong correlation between our proposed quality prediction model and the collected subjective ratings. For dataset and code, please visit our project page: https://stootaghaj.github.io/nova-project/