RELATE: Subjective evaluation dataset for automatic evaluation of relevance between text and audio

📅 2025-06-30

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

In text-to-audio (TTA) generation, evaluating text–audio relevance has long relied on costly human assessments or objective metrics of questionable validity (e.g., CLAPScore). To address this, we introduce RELATE—the first open-source, human-annotated subjective evaluation dataset for TTA relevance assessment, covering diverse acoustic categories and providing fine-grained relevance scores. Leveraging RELATE, we train an end-to-end deep learning model to predict human relevance judgments automatically. Experiments demonstrate that our model significantly outperforms CLAPScore across all sound categories, achieves consistently high performance, and exhibits strong agreement with human raters (Spearman ρ > 0.72). This work establishes the first standardized subjective benchmark for TTA relevance evaluation and provides a reliable, automated assessment tool—thereby filling two critical gaps in the field.

Technology Category

Application Category

📝 Abstract

In text-to-audio (TTA) research, the relevance between input text and output audio is an important evaluation aspect. Traditionally, it has been evaluated from both subjective and objective perspectives. However, subjective evaluation is costly in terms of money and time, and objective evaluation is unclear regarding the correlation to subjective evaluation scores. In this study, we construct RELATE, an open-sourced dataset that subjectively evaluates the relevance. Also, we benchmark a model for automatically predicting the subjective evaluation score from synthesized audio. Our model outperforms a conventional CLAPScore model, and that trend extends to many sound categories.

Problem

Research questions and friction points this paper is trying to address.

Evaluating text-audio relevance is costly and time-consuming.

Objective evaluation lacks correlation with subjective scores.

Proposing RELATE dataset and model for automated relevance scoring.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Constructs RELATE dataset for subjective evaluation

Benchmarks model predicting subjective scores

Outperforms CLAPScore in sound categories

🔎 Similar Papers

No similar papers found.

Authors to Follow