JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language Models

๐Ÿ“… 2025-05-23
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing research lacks a dedicated security evaluation benchmark for jailbreak attacks targeting Audio-Language Models (ALMs). Method: We introduce the first ALM-specific jailbreak safety benchmark, comprising 2,200 text prompts and 51,381 audio clips (268 hours), supporting 12 mainstream ALMs, 8 attack methodologies, and 5 defense strategies. We propose a unified cross-modal evaluation framework enabling systematic, four-dimensional quantitative analysisโ€”attack efficiency, topic sensitivity, speaker identity diversity, and representation robustness. Contribution/Results: Our evaluation reveals that ALMs are generally more vulnerable to audio-based jailbreak attacks than LLMs; vocal style and instruction embedding mechanisms emerge as critical vulnerability factors; and several widely adopted defense strategies exhibit significant efficacy limitations in audio-domain settings. This work establishes a foundational benchmark and analytical framework for advancing ALM security research.

Technology Category

Application Category

๐Ÿ“ Abstract
Audio Language Models (ALMs) have made significant progress recently. These models integrate the audio modality directly into the model, rather than converting speech into text and inputting text to Large Language Models (LLMs). While jailbreak attacks on LLMs have been extensively studied, the security of ALMs with audio modalities remains largely unexplored. Currently, there is a lack of an adversarial audio dataset and a unified framework specifically designed to evaluate and compare attacks and ALMs. In this paper, we present JALMBench, the extit{first} comprehensive benchmark to assess the safety of ALMs against jailbreak attacks. JALMBench includes a dataset containing 2,200 text samples and 51,381 audio samples with over 268 hours. It supports 12 mainstream ALMs, 4 text-transferred and 4 audio-originated attack methods, and 5 defense methods. Using JALMBench, we provide an in-depth analysis of attack efficiency, topic sensitivity, voice diversity, and attack representations. Additionally, we explore mitigation strategies for the attacks at both the prompt level and the response level.
Problem

Research questions and friction points this paper is trying to address.

Assessing ALM safety against jailbreak attacks
Lack of adversarial audio dataset for ALMs
No unified framework for evaluating ALM attacks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Comprehensive benchmark for ALM jailbreak safety
Dataset with 2200 text and 51381 audio samples
Supports 12 ALMs and multiple attack methods
๐Ÿ”Ž Similar Papers
No similar papers found.