Multi-Sampling-Frequency Naturalness MOS Prediction Using Self-Supervised Learning Model with Sampling-Frequency-Independent Layer

📅 2025-07-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the poor generalization of multi-sample-rate speech naturalness MOS prediction, this paper proposes a sampling-rate-agnostic self-supervised MOS prediction framework. The core innovation lies in a frequency-agnostic convolutional layer that decouples feature extraction from sampling-rate dependencies, coupled with large-scale MOS data pretraining and teacher-student knowledge distillation to enhance cross-sample-rate robustness. On the AMC 2025 Track 3 benchmark, our method achieves first place in the primary metric and fourth overall. Ablation studies confirm the critical contributions of both the frequency-agnostic layer and the distillation mechanism to prediction accuracy and generalization across sampling rates. This work establishes a scalable, highly robust, unified modeling paradigm for multi-sample-rate speech quality assessment.

Technology Category

Application Category

📝 Abstract
We introduce our submission to the AudioMOS Challenge (AMC) 2025 Track 3: mean opinion score (MOS) prediction for speech with multiple sampling frequencies (SFs). Our submitted model integrates an SF-independent (SFI) convolutional layer into a self-supervised learning (SSL) model to achieve SFI speech feature extraction for MOS prediction. We present some strategies to improve the MOS prediction performance of our model: distilling knowledge from a pretrained non-SFI-SSL model and pretraining with a large-scale MOS dataset. Our submission to the AMC 2025 Track 3 ranked the first in one evaluation metric and the fourth in the final ranking. We also report the results of our ablation study to investigate essential factors of our model.
Problem

Research questions and friction points this paper is trying to address.

Predict MOS for speech with multiple sampling frequencies
Achieve sampling-frequency-independent speech feature extraction
Improve MOS prediction using SSL and pretraining strategies
Innovation

Methods, ideas, or system contributions that make the work stand out.

SF-independent convolutional layer for feature extraction
Knowledge distillation from pretrained non-SFI-SSL model
Pretraining with large-scale MOS dataset
🔎 Similar Papers
No similar papers found.