Multi-Turn Adaptive Prompting Attack on Large Vision-Language Models

📅 2026-02-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing multi-turn jailbreaking attacks designed for large language models often fail when transferred to vision-language models (LVLMs), as they are readily intercepted by safety alignment mechanisms and struggle to elicit harmful outputs. To address this limitation, this work proposes MAPA, a novel dual-level adaptive attack framework that alternates between textual and visual adversarial actions within each turn while dynamically optimizing the attack trajectory across multiple turns to progressively amplify maliciousness in model responses. By integrating multi-turn adaptive prompting with a joint text-vision strategy, MAPA substantially enhances attack efficacy. Experiments on mainstream LVLMs—including LLaVA-V1.6-Mistral-7B and Qwen2.5-VL-7B-Instruct—demonstrate attack success rates 11%–35% higher than those of current state-of-the-art methods.

Technology Category

Application Category

📝 Abstract
Multi-turn jailbreak attacks are effective against text-only large language models (LLMs) by gradually introducing malicious content across turns. When extended to large vision-language models (LVLMs), we find that naively adding visual inputs can cause existing multi-turn jailbreaks to be easily defended. For example, overly malicious visual input will easily trigger the defense mechanism of safety-aligned LVLMs, making the response more conservative. To address this, we propose MAPA: a multi-turn adaptive prompting attack that 1) at each turn, alternates text-vision attack actions to elicit the most malicious response; and 2) across turns, adjusts the attack trajectory through iterative back-and-forth refinement to gradually amplify response maliciousness. This two-level design enables MAPA to consistently outperform state-of-the-art methods, improving attack success rates by 11-35% on recent benchmarks against LLaVA-V1.6-Mistral-7B, Qwen2.5-VL-7B-Instruct, Llama-3.2-Vision-11B-Instruct and GPT-4o-mini.
Problem

Research questions and friction points this paper is trying to address.

multi-turn jailbreak attack
large vision-language models
visual inputs
safety-aligned LVLMs
malicious content
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-turn adaptive prompting
vision-language models
jailbreak attack
safety alignment
iterative refinement
🔎 Similar Papers
No similar papers found.