MusicEval: A Generative Music Corpus with Expert Ratings for Automatic Text-to-Music Evaluation

📅 2025-01-18

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Current text-to-music (TTM) generation lacks automated evaluation methods that jointly ensure accuracy and efficiency—primarily due to the high subjectivity of musical quality assessment, challenges in cross-modal alignment between text and audio, and scarcity of high-quality human-annotated data. To address this, we introduce the first expert-annotated TTM evaluation dataset, comprising 2,748 music clips generated by 31 models and 13,740 fine-grained professional critiques. We further establish the first dedicated TTM benchmark and propose a learnable evaluation paradigm grounded in CLAP (Contrastive Language–Audio Pretraining), integrating music semantic annotation, cross-modal alignment, and regression-based scoring. Our learned evaluator achieves strong agreement with human judgments (Spearman ρ > 0.82), substantially outperforming conventional objective metrics (e.g., FAD, KL divergence). The framework provides a reliable, low-cost, and fully reproducible automated evaluation standard for TTM research and development.

Technology Category

Application Category

📝 Abstract

The technology for generating music from textual descriptions has seen rapid advancements. However, evaluating text-to-music (TTM) systems remains a significant challenge, primarily due to the difficulty of balancing performance and cost with existing objective and subjective evaluation methods. In this paper, we propose an automatic assessment task for TTM models to align with human perception. To address the TTM evaluation challenges posed by the professional requirements of music evaluation and the complexity of the relationship between text and music, we collect MusicEval, the first generative music assessment dataset. This dataset contains 2,748 music clips generated by 31 advanced and widely used models in response to 384 text prompts, along with 13,740 ratings from 14 music experts. Furthermore, we design a CLAP-based assessment model built on this dataset, and our experimental results validate the feasibility of the proposed task, providing a valuable reference for future development in TTM evaluation. The dataset is available at https://www.aishelltech.com/AISHELL_7A.

Problem

Research questions and friction points this paper is trying to address.

Automatic Evaluation

Text-to-Music

Cost-Effectiveness

Innovation

Methods, ideas, or system contributions that make the work stand out.

MusicEval Dataset

Automatic Evaluation Method

Text-to-Music Generation

🔎 Similar Papers

Melody-Guided Music Generation

2024-09-30Citations: 2

Authors to Follow