PARSA-Bench: A Comprehensive Persian Audio-Language Model Benchmark

📅 2026-03-15

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

This work addresses the absence of culturally and acoustically nuanced evaluation benchmarks for audio language models in Persian, particularly regarding classical poetic meter (vazn), traditional music, and code-switching. To bridge this gap, we introduce the first multimodal audio–language benchmark for Persian, comprising 16 tasks—including 10 newly proposed ones—and over 8,000 high-quality human-annotated samples, with a focus on spoken language understanding, paralinguistic analysis, and cultural context modeling. Experimental results reveal that current models perform near chance level on culturally grounded prosody-dependent tasks such as vazn detection, and audio-based models show no significant advantage over text-only baselines, indicating insufficient utilization of acoustic information. These findings underscore the benchmark’s critical role in advancing culturally aware audio language models.

Technology Category

Application Category

📝 Abstract

Persian poses unique audio understanding challenges through its classical poetry, traditional music, and pervasive code-switching - none captured by existing benchmarks. We introduce PARSA-Bench (Persian Audio Reasoning and Speech Assessment Benchmark), the first benchmark for evaluating large audio-language models on Persian language and culture, comprising 16 tasks and over 8,000 samples across speech understanding, paralinguistic analysis, and cultural audio understanding. Ten tasks are newly introduced, including poetry meter and style detection, traditional Persian music understanding, and code-switching detection. Text-only baselines consistently outperform audio counterparts, suggesting models may not leverage audio-specific information beyond what transcription alone provides. Culturally-grounded tasks expose a qualitatively distinct failure mode: all models perform near random chance on vazn detection regardless of scale, suggesting prosodic perception remains beyond the reach of current models. The dataset is publicly available at https://huggingface.co/datasets/MohammadJRanjbar/PARSA-Bench

Problem

Research questions and friction points this paper is trying to address.

Persian audio understanding

audio-language models

cultural audio understanding

code-switching

prosodic perception

Innovation

Methods, ideas, or system contributions that make the work stand out.

Persian audio-language benchmark

cultural audio understanding

code-switching detection

poetry meter recognition

prosodic perception

🔎 Similar Papers

AudioBench: A Universal Benchmark for Audio Large Language Models

2024-06-23arXiv.orgCitations: 17