Foot-In-The-Door: A Multi-turn Jailbreak for LLMs

📅 2025-02-27

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This work addresses the vulnerability of large language models (LLMs) to progressive jailbreaking attacks in multi-turn dialogues, which can systematically undermine their safety mechanisms. We propose a novel multi-turn jailbreaking method inspired by the social-psychological “foot-in-the-door” effect: through carefully designed incremental prompts, the method gradually lowers the model’s defensive threshold, inducing self-alignment drift and eliciting harmful outputs. To our knowledge, this is the first systematic integration of psychological principles into LLM jailbreaking research, revealing a previously underexplored self-corrosion mechanism under iterative interaction and challenging foundational safety assumptions of current alignment strategies. Evaluated on two established benchmarks across seven state-of-the-art LLMs, our method achieves an average attack success rate of 94%, substantially outperforming existing SOTA approaches.

Technology Category

Application Category

📝 Abstract

Ensuring AI safety is crucial as large language models become increasingly integrated into real-world applications. A key challenge is jailbreak, where adversarial prompts bypass built-in safeguards to elicit harmful disallowed outputs. Inspired by psychological foot-in-the-door principles, we introduce FITD,a novel multi-turn jailbreak method that leverages the phenomenon where minor initial commitments lower resistance to more significant or more unethical transgressions.Our approach progressively escalates the malicious intent of user queries through intermediate bridge prompts and aligns the model's response by itself to induce toxic responses. Extensive experimental results on two jailbreak benchmarks demonstrate that FITD achieves an average attack success rate of 94% across seven widely used models, outperforming existing state-of-the-art methods. Additionally, we provide an in-depth analysis of LLM self-corruption, highlighting vulnerabilities in current alignment strategies and emphasizing the risks inherent in multi-turn interactions.The code is available at https://github.com/Jinxiaolong1129/Foot-in-the-door-Jailbreak .

Problem

Research questions and friction points this paper is trying to address.

Multi-turn jailbreak method

AI safety vulnerabilities

Toxic response induction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-turn jailbreak method

Intermediate bridge prompts

LLM self-corruption analysis

🔎 Similar Papers

Lockpicking LLMs: A Logit-Based Jailbreak Using Token-level Manipulation