ArtPerception: ASCII Art-based Jailbreak on LLMs with Recognition Pre-test

📅 2025-10-11

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This work addresses the failure of large language models (LLMs) to maintain safety alignment when processing non-standard textual representations—specifically ASCII art—exposing a critical vulnerability in current semantic alignment paradigms. We propose ArtPerception, the first black-box jailbreaking framework leveraging ASCII art. Its core contributions are threefold: (1) a model-specific pre-testing phase to enhance attack targeting; (2) a modified Levenshtein distance (MLD) as the optimization objective, enabling single-shot precise jailbreaking; and (3) a two-stage attack strategy balancing robustness and cross-model transferability. Experiments demonstrate >92% jailbreaking success across four major open-weight LLMs and successful transfer to proprietary models—including GPT-4o, Claude Sonnet 3.7, and DeepSeek-V3—while evading state-of-the-art safeguards such as Llama Guard and Azure Content Filter. The results reveal a fundamental fragility of semantic alignment under unstructured, non-canonical inputs.

Technology Category

Application Category

📝 Abstract

The integration of Large Language Models (LLMs) into computer applications has introduced transformative capabilities but also significant security challenges. Existing safety alignments, which primarily focus on semantic interpretation, leave LLMs vulnerable to attacks that use non-standard data representations. This paper introduces ArtPerception, a novel black-box jailbreak framework that strategically leverages ASCII art to bypass the security measures of state-of-the-art (SOTA) LLMs. Unlike prior methods that rely on iterative, brute-force attacks, ArtPerception introduces a systematic, two-phase methodology. Phase 1 conducts a one-time, model-specific pre-test to empirically determine the optimal parameters for ASCII art recognition. Phase 2 leverages these insights to launch a highly efficient, one-shot malicious jailbreak attack. We propose a Modified Levenshtein Distance (MLD) metric for a more nuanced evaluation of an LLM's recognition capability. Through comprehensive experiments on four SOTA open-source LLMs, we demonstrate superior jailbreak performance. We further validate our framework's real-world relevance by showing its successful transferability to leading commercial models, including GPT-4o, Claude Sonnet 3.7, and DeepSeek-V3, and by conducting a rigorous effectiveness analysis against potential defenses such as LLaMA Guard and Azure's content filters. Our findings underscore that true LLM security requires defending against a multi-modal space of interpretations, even within text-only inputs, and highlight the effectiveness of strategic, reconnaissance-based attacks. Content Warning: This paper includes potentially harmful and offensive model outputs.

Problem

Research questions and friction points this paper is trying to address.

Proposing ASCII art-based jailbreak attacks bypassing LLM security measures

Introducing systematic two-phase methodology with recognition pre-testing

Demonstrating vulnerability of safety alignments to non-standard data representations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses ASCII art to bypass LLM security measures

Employs pre-test phase for optimal parameter selection

Introduces Modified Levenshtein Distance for recognition evaluation

🔎 Similar Papers

Read Over the Lines: Attacking LLMs and Toxicity Detection Systems with ASCII Art to Mask Profanity