SingVERSE: A Diverse, Real-World Benchmark for Singing Voice Enhancement

📅 2025-09-25

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

Current singing voice enhancement research is hindered by the lack of real-world evaluation data. To address this, we introduce SingVERSE—the first benchmark specifically designed for singing voice enhancement under realistic acoustic conditions—featuring diverse recording scenarios and high-fidelity paired clean/distorted recordings. Leveraging SingVERSE, we conduct a systematic evaluation of state-of-the-art speech enhancement models, revealing for the first time an inherent trade-off between perceptual quality and intelligibility in singing tasks. We further validate the critical role of in-domain fine-tuning and propose a singing-aware data adaptation strategy. Experiments demonstrate that our approach significantly improves enhancement performance while preserving vocal naturalness. SingVERSE establishes a standardized, reproducible evaluation framework and provides principled optimization guidelines for singing voice enhancement research.

Technology Category

Application Category

📝 Abstract

This paper presents a benchmark for singing voice enhancement. The development of singing voice enhancement is limited by the lack of realistic evaluation data. To address this gap, this paper introduces SingVERSE, the first real-world benchmark for singing voice enhancement, covering diverse acoustic scenarios and providing paired, studio-quality clean references. Leveraging SingVERSE, we conduct a comprehensive evaluation of state-of-the-art models and uncover a consistent trade-off between perceptual quality and intelligibility. Finally, we show that training on in-domain singing data substantially improves enhancement performance without degrading speech capabilities, establishing a simple yet effective path forward. This work offers the community a foundational benchmark together with critical insights to guide future advances in this underexplored domain. Demopage: https://singverse.github.io

Problem

Research questions and friction points this paper is trying to address.

Addresses the lack of realistic evaluation data for singing voice enhancement

Introduces the first real-world benchmark covering diverse acoustic scenarios

Investigates the trade-off between perceptual quality and intelligibility in models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Created real-world singing benchmark with clean references

Evaluated trade-off between perceptual quality and intelligibility

Showed in-domain training improves singing enhancement performance

🔎 Similar Papers

GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks

2024-09-20arXiv.orgCitations: 2

💼 Related Jobs

Member of Technical Staff - Voice Model

xAI

$150,000 - $450,000 USD

Palo Alto, CA / Palo Alto, CA, Palo Alto, California, United States

AI Research Scientist - Voice AI Team, Meta Superintelligence Labs