Foot-In-The-Door: A Multi-turn Jailbreak for LLMs

๐Ÿ“… 2025-02-27
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the vulnerability of large language models (LLMs) to progressive jailbreaking attacks in multi-turn dialogues, which can systematically undermine their safety mechanisms. We propose a novel multi-turn jailbreaking method inspired by the social-psychological โ€œfoot-in-the-doorโ€ effect: through carefully designed incremental prompts, the method gradually lowers the modelโ€™s defensive threshold, inducing self-alignment drift and eliciting harmful outputs. To our knowledge, this is the first systematic integration of psychological principles into LLM jailbreaking research, revealing a previously underexplored self-corrosion mechanism under iterative interaction and challenging foundational safety assumptions of current alignment strategies. Evaluated on two established benchmarks across seven state-of-the-art LLMs, our method achieves an average attack success rate of 94%, substantially outperforming existing SOTA approaches.

Technology Category

Application Category

๐Ÿ“ Abstract
Ensuring AI safety is crucial as large language models become increasingly integrated into real-world applications. A key challenge is jailbreak, where adversarial prompts bypass built-in safeguards to elicit harmful disallowed outputs. Inspired by psychological foot-in-the-door principles, we introduce FITD,a novel multi-turn jailbreak method that leverages the phenomenon where minor initial commitments lower resistance to more significant or more unethical transgressions.Our approach progressively escalates the malicious intent of user queries through intermediate bridge prompts and aligns the model's response by itself to induce toxic responses. Extensive experimental results on two jailbreak benchmarks demonstrate that FITD achieves an average attack success rate of 94% across seven widely used models, outperforming existing state-of-the-art methods. Additionally, we provide an in-depth analysis of LLM self-corruption, highlighting vulnerabilities in current alignment strategies and emphasizing the risks inherent in multi-turn interactions.The code is available at https://github.com/Jinxiaolong1129/Foot-in-the-door-Jailbreak .
Problem

Research questions and friction points this paper is trying to address.

Multi-turn jailbreak method
AI safety vulnerabilities
Toxic response induction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-turn jailbreak method
Intermediate bridge prompts
LLM self-corruption analysis
๐Ÿ”Ž Similar Papers
No similar papers found.