SingNet: Towards a Large-Scale, Diverse, and In-the-Wild Singing Voice Dataset

📅 2025-05-14

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

Current singing voice synthesis (SVS) and singing voice conversion (SVC) are hindered by the lack of large-scale, diverse, and real-world datasets. To address this, we introduce SingNet—the first publicly available large-scale, multilingual, multi-style in-the-wild singing voice dataset, comprising 3,000 hours of high-fidelity audio. We design an end-to-end automated pipeline for web crawling, source separation, phoneme-level alignment, and data cleaning, specifically tailored to realistic singing scenarios. Furthermore, we release the first SingNet-pretrained state-of-the-art models—including Wav2Vec 2.0, BigVGAN, and NSF-HiFiGAN—alongside all data, code, and demo samples. Extensive evaluations on singing voice synthesis, SVC, and neural vocoder benchmarks demonstrate consistent and significant improvements over existing methods. SingNet establishes a robust foundation for advancing research in singing voice modeling and generation.

Technology Category

Application Category

📝 Abstract

The lack of a publicly-available large-scale and diverse dataset has long been a significant bottleneck for singing voice applications like Singing Voice Synthesis (SVS) and Singing Voice Conversion (SVC). To tackle this problem, we present SingNet, an extensive, diverse, and in-the-wild singing voice dataset. Specifically, we propose a data processing pipeline to extract ready-to-use training data from sample packs and songs on the internet, forming 3000 hours of singing voices in various languages and styles. Furthermore, to facilitate the use and demonstrate the effectiveness of SingNet, we pre-train and open-source various state-of-the-art (SOTA) models on Wav2vec2, BigVGAN, and NSF-HiFiGAN based on our collected singing voice data. We also conduct benchmark experiments on Automatic Lyric Transcription (ALT), Neural Vocoder, and Singing Voice Conversion (SVC). Audio demos are available at: https://singnet-dataset.github.io/.

Problem

Research questions and friction points this paper is trying to address.

Lack of large-scale diverse singing voice dataset

Need for ready-to-use training data extraction

Demonstrating SingNet's effectiveness via SOTA models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale diverse singing voice dataset creation

Data processing pipeline for training data extraction

Pre-trained SOTA models for singing applications

🔎 Similar Papers

No similar papers found.