π€ AI Summary
This work identifies and addresses cultural representation imbalance in music generation models: non-Western genres constitute only 5.7% of duration in prevalent training datasets, leading to significant cross-cultural performance disparities. We present the first systematic quantification of this cultural bias and propose a parameter-efficient fine-tuning (PEFT)-based fair cross-cultural transfer baseline. Within the MusicGen and MUS-Tango frameworks, we conduct few-shot adaptation experiments on Hindustani classical and Turkish maqam music using a newly curated multicultural dataset. Results demonstrate that PEFT substantially improves generative fidelity for non-Western genres, validating its efficacy in mitigating cultural biasβwhile also exposing fundamental challenges in few-shot cross-genre transfer. This study establishes the first empirical benchmark and methodological framework for multicultural AI music modeling.
π Abstract
The advent of Music-Language Models has greatly enhanced the automatic music generation capability of AI systems, but they are also limited in their coverage of the musical genres and cultures of the world. We present a study of the datasets and research papers for music generation and quantify the bias and under-representation of genres. We find that only 5.7% of the total hours of existing music datasets come from non-Western genres, which naturally leads to disparate performance of the models across genres. We then investigate the efficacy of Parameter-Efficient Fine-Tuning (PEFT) techniques in mitigating this bias. Our experiments with two popular models -- MusicGen and Mustango, for two underrepresented non-Western music traditions -- Hindustani Classical and Turkish Makam music, highlight the promises as well as the non-triviality of cross-genre adaptation of music through small datasets, implying the need for more equitable baseline music-language models that are designed for cross-cultural transfer learning.