SpeechJBB: Probing Safety Alignment and Comprehension in Large Audio Language Models under Code-Switched Speech

📅 2026-06-04

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Current safety alignment evaluations for large audio language models are largely confined to monolingual textual prompts, failing to capture real-world risks in multilingual code-switching and spoken-language contexts. This work proposes SpeechJBB—the first benchmark for evaluating audio jailbreaks under scenarios involving code-switched speech and phonetically plausible natural speech perturbations. By synthesizing multilingual harmful audio prompts and inserting phonologically valid pseudo-words, the study quantifies jailbreak success rates across diverse linguistic settings. Experimental results demonstrate that code-switching significantly increases jailbreak success, particularly in non-English language combinations, while pseudo-word insertion further degrades the model’s ability to refuse harmful requests. These findings reveal that existing safety mechanisms remain vulnerable to naturalistic speech-based adversarial attacks, underscoring the urgent need for robust alignment evaluation frameworks tailored to multilingual spoken interactions.

📝 Abstract

Large audio language models (LALMs) are increasingly deployed in real-world applications, yet their safety alignment is still primarily evaluated on monolingual, text-based harmful prompts. This leaves their generalizability under multilingual and spoken settings, particularly code-switched speech, largely underexplored. To address this gap, we introduce SpeechJBB, an audio jailbreak dataset for benchmarking across multiple state-of-the-art LALMs. The extent of safety weaknesses is further probed by introducing an augmented setting where phonologically plausible pseudo-words are inserted around safety-critical terms to simulate localized obfuscation. Across models, code-switched harmful audio yields substantially high jailbreak success rates (JSR), with non-English monolingual and non-English code-switched pairs exhibiting the highest attack success. Pseudo-word insertion further reduces refusal rates, which demonstrates that natural-sounding obfuscation can effectively bypass safety policies.

Problem

Research questions and friction points this paper is trying to address.

safety alignment

code-switched speech

large audio language models

audio jailbreak

multilingual spoken settings

Innovation

Methods, ideas, or system contributions that make the work stand out.

code-switched speech

audio jailbreak

safety alignment