Multi-Turn Adaptive Prompting Attack on Large Vision-Language Models

📅 2026-02-16

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing multi-turn jailbreaking attacks designed for large language models often fail when transferred to vision-language models (LVLMs), as they are readily intercepted by safety alignment mechanisms and struggle to elicit harmful outputs. To address this limitation, this work proposes MAPA, a novel dual-level adaptive attack framework that alternates between textual and visual adversarial actions within each turn while dynamically optimizing the attack trajectory across multiple turns to progressively amplify maliciousness in model responses. By integrating multi-turn adaptive prompting with a joint text-vision strategy, MAPA substantially enhances attack efficacy. Experiments on mainstream LVLMs—including LLaVA-V1.6-Mistral-7B and Qwen2.5-VL-7B-Instruct—demonstrate attack success rates 11%–35% higher than those of current state-of-the-art methods.

Technology Category

Application Category

📝 Abstract

Multi-turn jailbreak attacks are effective against text-only large language models (LLMs) by gradually introducing malicious content across turns. When extended to large vision-language models (LVLMs), we find that naively adding visual inputs can cause existing multi-turn jailbreaks to be easily defended. For example, overly malicious visual input will easily trigger the defense mechanism of safety-aligned LVLMs, making the response more conservative. To address this, we propose MAPA: a multi-turn adaptive prompting attack that 1) at each turn, alternates text-vision attack actions to elicit the most malicious response; and 2) across turns, adjusts the attack trajectory through iterative back-and-forth refinement to gradually amplify response maliciousness. This two-level design enables MAPA to consistently outperform state-of-the-art methods, improving attack success rates by 11-35% on recent benchmarks against LLaVA-V1.6-Mistral-7B, Qwen2.5-VL-7B-Instruct, Llama-3.2-Vision-11B-Instruct and GPT-4o-mini.

Problem

Research questions and friction points this paper is trying to address.

multi-turn jailbreak attack

large vision-language models

visual inputs

safety-aligned LVLMs

malicious content

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-turn adaptive prompting

vision-language models

jailbreak attack

safety alignment

iterative refinement

🔎 Similar Papers

No similar papers found.

Authors to Follow