Robustness and accuracy of mean opinion scores with hard and soft outlier detection

📅 2025-09-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In image/video subjective quality assessment, outlier ratings compromise the reliability of Mean Opinion Scores (MOS). Method: This paper introduces the first adversarial stress-testing framework for outlier detection methods. It (1) designs an evolutionary optimization-based black-box adversarial attack to generate worst-case perturbed rating data, and (2) proposes two novel outlier detection algorithms with low computational complexity and high robustness. Contribution/Results: We conduct the first systematic evaluation of mainstream statistical outlier detectors under extreme perturbations, identifying their failure boundaries. Empirical results demonstrate that our methods significantly outperform existing approaches in worst-case scenarios—achieving an average robustness improvement of 32.7%. Moreover, the proposed methods exhibit strong generalizability across diverse rating distributions and experimental settings. All source code and benchmark datasets are publicly released.

Technology Category

Application Category

📝 Abstract
In subjective assessment of image and video quality, observers rate or compare selected stimuli. Before calculating the mean opinion scores (MOS) for these stimuli from the ratings, it is recommended to identify and deal with outliers that may have given unreliable ratings. Several methods are available for this purpose, some of which have been standardized. These methods are typically based on statistics and sometimes tested by introducing synthetic ratings from artificial outliers, such as random clickers. However, a reliable and comprehensive approach is lacking for comparative performance analysis of outlier detection methods. To fill this gap, this work proposes and applies an empirical worst-case analysis as a general solution. Our method involves evolutionary optimization of an adversarial black-box attack on outlier detection algorithms, where the adversary maximizes the distortion of scale values with respect to ground truth. We apply our analysis to several hard and soft outlier detection methods for absolute category ratings and show their differing performance in this stress test. In addition, we propose two new outlier detection methods with low complexity and excellent worst-case performance. Software for adversarial attacks and data analysis is available.
Problem

Research questions and friction points this paper is trying to address.

Evaluating robustness and accuracy of mean opinion scores
Lacking comprehensive performance analysis for outlier detection methods
Proposing adversarial attack optimization to test detection algorithms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evolutionary optimization of adversarial black-box attack
Empirical worst-case analysis for outlier detection
Low complexity methods with excellent worst-case performance
🔎 Similar Papers
No similar papers found.
Dietmar Saupe
Dietmar Saupe
Professor of Computer Science, University of Konstanz, Germany
Multimedia Signal ProcessingSport Informatics
T
Tim Bleile
Department of Computer and Information Science, University of Konstanz, Konstanz, Germany