GOMPSNR: Reflourish the Signal-to-Noise Ratio Metric for Audio Generation Tasks

📅 2026-01-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited correlation between traditional signal-to-noise ratio (SNR) and human auditory perception in audio generation tasks, which hinders effective quality assessment. To overcome this limitation, the study introduces phase distance into the SNR framework for the first time, reformulating the SNR metric based on signal processing theory to propose a novel evaluation measure, GOMPSNR. Furthermore, two loss functions are designed: magnitude-guided phase optimization and joint magnitude-phase optimization. Experimental results demonstrate that GOMPSNR more accurately quantifies audio distortion, and the proposed loss functions significantly improve the generation quality of neural vocoders, thereby enhancing the alignment between objective metrics and subjective perceptual judgments.

Technology Category

Application Category

📝 Abstract
In the field of audio generation, signal-to-noise ratio (SNR) has long served as an objective metric for evaluating audio quality. Nevertheless, recent studies have shown that SNR and its variants are not always highly correlated with human perception, prompting us to raise the questions: Why does SNR fail in measuring audio quality? And how to improve its reliability as an objective metric? In this paper, we identify the inadequate measurement of phase distance as a pivotal factor and propose to reformulate SNR with specially designed phase-distance terms, yielding an improved metric named GOMPSNR. We further extend the newly proposed formulation to derive two novel categories of loss function, corresponding to magnitude-guided phase refinement and joint magnitude-phase optimization, respectively. Besides, extensive experiments are conducted for an optimal combination of different loss functions. Experimental results on advanced neural vocoders demonstrate that our proposed GOMPSNR exhibits more reliable error measurement than SNR. Meanwhile, our proposed loss functions yield substantial improvements in model performance, and our wellchosen combination of different loss functions further optimizes the overall model capability.
Problem

Research questions and friction points this paper is trying to address.

signal-to-noise ratio
audio generation
perceptual quality
objective metric
phase distance
Innovation

Methods, ideas, or system contributions that make the work stand out.

GOMPSNR
phase-aware metric
audio generation
magnitude-phase optimization
perceptual audio quality
🔎 Similar Papers
No similar papers found.
L
Lingling Dai
Institute of Acoustics, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
A
Andong Li
Institute of Acoustics, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
Cheng Chi
Cheng Chi
Columbia University, Stanford University
robotics
Yifan Liang
Yifan Liang
Huazhong University of Science and Technology
Computer VisionMachine Learning
X
Xiaodong Li
Institute of Acoustics, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
C
C. Zheng
Institute of Acoustics, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China