Towards Automatic Evaluation and High-Quality Pseudo-Parallel Dataset Construction for Audio Editing: A Human-in-the-Loop Method

📅 2025-08-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Audio editing has long suffered from the lack of high-quality evaluation benchmarks and reliable automated assessment metrics. To address this, we propose an expert-knowledge-driven closed-loop evaluation framework. First, we construct AuditScore—the first subjective evaluation dataset for audio editing—comprising over 6,300 samples annotated with multi-dimensional professional ratings. Second, we train AuditEval, an automatic Mean Opinion Score (MOS) prediction model achieving high accuracy in quality estimation. Third, we leverage AuditEval in a reverse pipeline to filter and refine synthetic data, generating a validated pseudo-parallel dataset of superior quality. This work pioneers the organic integration of expert scoring, automated evaluation, and data curation: it introduces the first task-specific audio editing evaluation model and benchmark dataset, and establishes a “evaluate–feedback–generate” closed-loop paradigm—providing a reproducible, scalable foundation for future research and development in audio editing.

Technology Category

Application Category

📝 Abstract
Audio editing aims to manipulate audio content based on textual descriptions, supporting tasks such as adding, removing, or replacing audio events. Despite recent progress, the lack of high-quality benchmark datasets and comprehensive evaluation metrics remains a major challenge for both assessing audio editing quality and improving the task itself. In this work, we propose a novel approach for audio editing task by incorporating expert knowledge into both the evaluation and dataset construction processes: 1) First, we establish AuditScore, the first comprehensive dataset for subjective evaluation of audio editing, consisting of over 6,300 edited samples generated from 7 representative audio editing frameworks and 23 system configurations. Each sample is annotated by professional raters on three key aspects of audio editing quality: overall Quality, Relevance to editing intent, and Faithfulness to original features. 2) Based on this dataset, we train AuditEval, the first model designed for automatic MOS-style scoring tailored to audio editing tasks. AuditEval addresses the critical lack of objective evaluation metrics and the prohibitive cost of subjective assessment in this field. 3) We further leverage AuditEval to evaluate and filter a large amount of synthetically mixed editing pairs, constructing a high-quality pseudo-parallel dataset by selecting the most plausible samples. Objective experiments validate the effectiveness of our expert-informed filtering strategy in yielding higher-quality data, while also revealing the limitations of relying solely on objective metrics. The dataset, codes and tools can be found at: https://github.com/NKU-HLT/AuditEval.
Problem

Research questions and friction points this paper is trying to address.

Lack of high-quality benchmark datasets for audio editing tasks
Absence of comprehensive evaluation metrics for audio editing quality
Need for expert-informed methods to construct pseudo-parallel datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Establish AuditScore dataset for subjective evaluation
Train AuditEval model for automatic MOS scoring
Construct pseudo-parallel dataset via expert filtering
🔎 Similar Papers
2024-03-06IEEE Transactions on Audio, Speech, and Language ProcessingCitations: 1
Y
Yuhang Jia
College of Computer Science, Nankai University, Tianjin, China
H
Hui Wang
College of Computer Science, Nankai University, Tianjin, China
X
Xin Nie
College of Electronic Information and Optical Engineering, Nankai University, Tianjin, China
Yujie Guo
Yujie Guo
yujie.guo@ugent.be
low dimensional semiconductors
Lianru Gao
Lianru Gao
Aerospace Information Research Institute, Chinese Academy of Sciences
Hyperspectral Remote SensingInformation ExtractionMachine / Deep Learning
Y
Yong Qin
College of Computer Science, Nankai University, Tianjin, China