Bias in the Ear of the Listener: Assessing Sensitivity in Audio Language Models Across Linguistic, Demographic, and Positional Variations

📅 2026-02-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the lack of systematic evaluation of bias in multilingual spoken large language models (MLLMs) under linguistic, demographic, and structural variations. We introduce and release BiasInEar—a novel dataset comprising 70.8 hours of speech and 11,200 questions—derived from Global MMLU Lite, covering English, Chinese, and Korean, with balanced gender representation and accent diversity. A unified framework for assessing fairness and robustness is proposed, employing four complementary metrics: accuracy, entropy, APES, and Fleiss’ κ. Through comprehensive analysis of nine MLLMs under various perturbations, we find that spoken input amplifies structural biases; models exhibit high sensitivity to language and answer-option ordering but relative robustness to demographic factors such as gender; and both model architecture and inference strategies significantly influence cross-lingual robustness.

Technology Category

Application Category

📝 Abstract
This work presents the first systematic investigation of speech bias in multilingual MLLMs. We construct and release the BiasInEar dataset, a speech-augmented benchmark based on Global MMLU Lite, spanning English, Chinese, and Korean, balanced by gender and accent, and totaling 70.8 hours ($\approx$4,249 minutes) of speech with 11,200 questions. Using four complementary metrics (accuracy, entropy, APES, and Fleiss'$\kappa$), we evaluate nine representative models under linguistic (language and accent), demographic (gender), and structural (option order) perturbations. Our findings reveal that MLLMs are relatively robust to demographic factors but highly sensitive to language and option order, suggesting that speech can amplify existing structural biases. Moreover, architectural design and reasoning strategy substantially affect robustness across languages. Overall, this study establishes a unified framework for assessing fairness and robustness in speech-integrated LLMs, bridging the gap between text- and speech-based evaluation. The resources can be found at https://github.com/ntunlplab/BiasInEar.
Problem

Research questions and friction points this paper is trying to address.

speech bias
multilingual MLLMs
fairness
robustness
linguistic variation
Innovation

Methods, ideas, or system contributions that make the work stand out.

speech bias
multilingual MLLMs
BiasInEar dataset
fairness evaluation
robustness assessment
🔎 Similar Papers
No similar papers found.
S
Sheng-Lun Wei
Department of Computer Science and Information Engineering, National Taiwan University, Taiwan
Y
Yu-Ling Liao
Department of Computer Science and Information Engineering, National Taiwan University, Taiwan
Y
Yen-Hua Chang
Department of Computer Science and Information Engineering, National Taiwan University, Taiwan
Hen-Hsen Huang
Hen-Hsen Huang
Institute of Information Science, Academia Sinica, Taiwan
natural language processingdiscourse analysisinformation retrievalChinese processing
Hsin-Hsi Chen
Hsin-Hsi Chen
Professor of Computer Science, National Taiwan University
Natural Language ProcessingInformation RetrievalInformation ExtractionWeb MiningArtificial Intelligence