MLingualFC: Evaluating Jailbreak Vulnerabilities in Multilingual Vision-Language Models

📅 2026-06-05

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the lack of systematic evaluation of safety robustness in multilingual vision-language models (VLMs) under structured visual prompts—particularly flowcharts—and highlights critical blind spots in non-English contexts. The authors introduce MLingualFC, the first multimodal safety benchmark tailored for multilingual VLMs, which encodes harmful instructions into flowcharts across five languages to assess jailbreak risks in a black-box setting against prominent models such as Qwen2.5-VL, Gemma-4, and Pangea. Experimental results reveal that current safety alignment mechanisms exhibit insufficient generalization across both language and modality boundaries: attacks using Latin-script languages achieve high success rates, demonstrating that visual encoding can effectively bypass textual safety filters, whereas lower success rates for non-Latin scripts (e.g., Punjabi) stem primarily from OCR recognition limitations rather than enhanced model safety.

📝 Abstract

Vision-Language Models (VLMs) have demonstrated strong performance across multimodal tasks, yet their safety robustness remains an open challenge. While prior work has shown that structured visual prompts such as flowcharts can effectively jailbreak VLMs, existing studies are largely limited to English-centric settings. In this paper, we introduce MLingualFC, a multilingual multimodal benchmark designed to evaluate jailbreak vulnerabilities of VLMs across diverse languages using structured flowchart representations. MLingualFC encodes harmful instructions into flowchart images across five languages (Hindi, Punjabi, Spanish, Romanian, and German). We evaluate state-of-the-art multilingual VLMs, including Qwen2.5-VL, Gemma-4, and Pangea, under a black-box threat model. Our results reveal significant multilingual safety gaps. Flowchart-based attacks achieve high attack success rates (ASR) in case of Latin script languages, demonstrating that visual encoding of harmful content effectively bypasses safety alignment across languages. In contrast, non-Latin script languages such as Punjabi exhibit substantially lower ASR, suggesting potential limitations in visual text recognition rather than stronger safety alignment. These findings highlight that current VLM safety mechanisms fail to generalize across languages and modalities. Resources are available at https://github.com/Rishabhpm23/MLingualFC

Problem

Research questions and friction points this paper is trying to address.

jailbreak vulnerabilities

multilingual vision-language models

safety robustness

flowchart-based attacks

cross-lingual generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

multilingual vision-language models

jailbreak attacks

flowchart-based prompts