Robustness and accuracy of mean opinion scores with hard and soft outlier detection

📅 2025-09-08

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

In image/video subjective quality assessment, outlier ratings compromise the reliability of Mean Opinion Scores (MOS). Method: This paper introduces the first adversarial stress-testing framework for outlier detection methods. It (1) designs an evolutionary optimization-based black-box adversarial attack to generate worst-case perturbed rating data, and (2) proposes two novel outlier detection algorithms with low computational complexity and high robustness. Contribution/Results: We conduct the first systematic evaluation of mainstream statistical outlier detectors under extreme perturbations, identifying their failure boundaries. Empirical results demonstrate that our methods significantly outperform existing approaches in worst-case scenarios—achieving an average robustness improvement of 32.7%. Moreover, the proposed methods exhibit strong generalizability across diverse rating distributions and experimental settings. All source code and benchmark datasets are publicly released.

Technology Category

Application Category

📝 Abstract

In subjective assessment of image and video quality, observers rate or compare selected stimuli. Before calculating the mean opinion scores (MOS) for these stimuli from the ratings, it is recommended to identify and deal with outliers that may have given unreliable ratings. Several methods are available for this purpose, some of which have been standardized. These methods are typically based on statistics and sometimes tested by introducing synthetic ratings from artificial outliers, such as random clickers. However, a reliable and comprehensive approach is lacking for comparative performance analysis of outlier detection methods. To fill this gap, this work proposes and applies an empirical worst-case analysis as a general solution. Our method involves evolutionary optimization of an adversarial black-box attack on outlier detection algorithms, where the adversary maximizes the distortion of scale values with respect to ground truth. We apply our analysis to several hard and soft outlier detection methods for absolute category ratings and show their differing performance in this stress test. In addition, we propose two new outlier detection methods with low complexity and excellent worst-case performance. Software for adversarial attacks and data analysis is available.

Problem

Research questions and friction points this paper is trying to address.

Evaluating robustness and accuracy of mean opinion scores

Lacking comprehensive performance analysis for outlier detection methods

Proposing adversarial attack optimization to test detection algorithms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evolutionary optimization of adversarial black-box attack

Empirical worst-case analysis for outlier detection

Low complexity methods with excellent worst-case performance

🔎 Similar Papers

No similar papers found.