LingoLoop Attack: Trapping MLLMs via Linguistic Context and State Entrapment into Endless Loops

📅 2025-06-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multimodal large language models (MLLMs) are vulnerable to resource-exhaustion attacks during inference, leading to excessive computational overhead and latency. Method: This paper introduces a novel adversarial attack that exploits part-of-speech (POS) information—first identifying POS as a critical determinant of end-of-sequence (EOS) token generation probability—and proposes a POS-aware delay mechanism and generation-path pruning. It employs POS-guided attention reweighting, hidden-state magnitude constraints, and syntactic-structure-driven suppression of output diversity to induce fine-grained linguistic-structure-dependent infinite generation loops. Results: Evaluated on models including Qwen2.5-VL-3B, the attack increases generated token count by up to 30×, consistently triggers maximum generation length limits, and significantly amplifies energy consumption and inference latency—demonstrating a practical, syntax-aware resource-exhaustion threat to MLLMs.

Technology Category

Application Category

📝 Abstract
Multimodal Large Language Models (MLLMs) have shown great promise but require substantial computational resources during inference. Attackers can exploit this by inducing excessive output, leading to resource exhaustion and service degradation. Prior energy-latency attacks aim to increase generation time by broadly shifting the output token distribution away from the EOS token, but they neglect the influence of token-level Part-of-Speech (POS) characteristics on EOS and sentence-level structural patterns on output counts, limiting their efficacy. To address this, we propose LingoLoop, an attack designed to induce MLLMs to generate excessively verbose and repetitive sequences. First, we find that the POS tag of a token strongly affects the likelihood of generating an EOS token. Based on this insight, we propose a POS-Aware Delay Mechanism to postpone EOS token generation by adjusting attention weights guided by POS information. Second, we identify that constraining output diversity to induce repetitive loops is effective for sustained generation. We introduce a Generative Path Pruning Mechanism that limits the magnitude of hidden states, encouraging the model to produce persistent loops. Extensive experiments demonstrate LingoLoop can increase generated tokens by up to 30 times and energy consumption by a comparable factor on models like Qwen2.5-VL-3B, consistently driving MLLMs towards their maximum generation limits. These findings expose significant MLLMs' vulnerabilities, posing challenges for their reliable deployment. The code will be released publicly following the paper's acceptance.
Problem

Research questions and friction points this paper is trying to address.

Exploiting MLLMs to generate excessive output via linguistic traps
Delaying EOS token generation using POS-aware attention manipulation
Inducing repetitive loops by constraining output diversity and hidden states
Innovation

Methods, ideas, or system contributions that make the work stand out.

POS-Aware Delay Mechanism adjusts attention weights
Generative Path Pruning limits hidden states
Induces verbose and repetitive output sequences
🔎 Similar Papers
No similar papers found.