Jailbreaking Large Vision Language Models in Intelligent Transportation Systems

📅 2025-11-17

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Large vision-language models (LVLMs) exhibit severe jailbreaking vulnerabilities in intelligent transportation systems (ITS), where typography-based image manipulations and multi-turn adversarial prompting can induce harmful outputs. Method: We introduce the first ITS-specific harmful query dataset, propose a novel jailbreaking approach leveraging image typography perturbations and iterative adversarial prompting, and design a multi-layer defense mechanism integrating GPT-4 toxicity scoring, rule-based filtering, and human verification. Contribution/Results: Extensive experiments across mainstream open- and closed-source LVLMs demonstrate that current models are broadly susceptible to image-guided jailbreaking. Our defense reduces the harmful response rate by up to 62.3%. This work constitutes the first systematic investigation of LVLM security risks in ITS and provides a scalable, end-to-end framework for evaluating both attacks and defenses in transportation-oriented multimodal AI systems.

Technology Category

Application Category

📝 Abstract

Large Vision Language Models (LVLMs) demonstrate strong capabilities in multimodal reasoning and many real-world applications, such as visual question answering. However, LVLMs are highly vulnerable to jailbreaking attacks. This paper systematically analyzes the vulnerabilities of LVLMs integrated in Intelligent Transportation Systems (ITS) under carefully crafted jailbreaking attacks. First, we carefully construct a dataset with harmful queries relevant to transportation, following OpenAI's prohibited categories to which the LVLMs should not respond. Second, we introduce a novel jailbreaking attack that exploits the vulnerabilities of LVLMs through image typography manipulation and multi-turn prompting. Third, we propose a multi-layered response filtering defense technique to prevent the model from generating inappropriate responses. We perform extensive experiments with the proposed attack and defense on the state-of-the-art LVLMs (both open-source and closed-source). To evaluate the attack method and defense technique, we use GPT-4's judgment to determine the toxicity score of the generated responses, as well as manual verification. Further, we compare our proposed jailbreaking method with existing jailbreaking techniques and highlight severe security risks involved with jailbreaking attacks with image typography manipulation and multi-turn prompting in the LVLMs integrated in ITS.

Problem

Research questions and friction points this paper is trying to address.

Analyzing LVLM vulnerabilities to jailbreaking attacks in transportation systems

Developing image typography manipulation for multimodal jailbreaking attacks

Proposing multi-layered defense techniques against inappropriate AI responses

Innovation

Methods, ideas, or system contributions that make the work stand out.

Constructs harmful transportation query dataset

Introduces image typography manipulation attacks

Proposes multi-layered response filtering defense

🔎 Similar Papers

No similar papers found.

Authors to Follow