๐ค AI Summary
This work addresses the vulnerability of large language models (LLMs) to progressive jailbreaking attacks in multi-turn dialogues, which can systematically undermine their safety mechanisms. We propose a novel multi-turn jailbreaking method inspired by the social-psychological โfoot-in-the-doorโ effect: through carefully designed incremental prompts, the method gradually lowers the modelโs defensive threshold, inducing self-alignment drift and eliciting harmful outputs. To our knowledge, this is the first systematic integration of psychological principles into LLM jailbreaking research, revealing a previously underexplored self-corrosion mechanism under iterative interaction and challenging foundational safety assumptions of current alignment strategies. Evaluated on two established benchmarks across seven state-of-the-art LLMs, our method achieves an average attack success rate of 94%, substantially outperforming existing SOTA approaches.
๐ Abstract
Ensuring AI safety is crucial as large language models become increasingly integrated into real-world applications. A key challenge is jailbreak, where adversarial prompts bypass built-in safeguards to elicit harmful disallowed outputs. Inspired by psychological foot-in-the-door principles, we introduce FITD,a novel multi-turn jailbreak method that leverages the phenomenon where minor initial commitments lower resistance to more significant or more unethical transgressions.Our approach progressively escalates the malicious intent of user queries through intermediate bridge prompts and aligns the model's response by itself to induce toxic responses. Extensive experimental results on two jailbreak benchmarks demonstrate that FITD achieves an average attack success rate of 94% across seven widely used models, outperforming existing state-of-the-art methods. Additionally, we provide an in-depth analysis of LLM self-corruption, highlighting vulnerabilities in current alignment strategies and emphasizing the risks inherent in multi-turn interactions.The code is available at https://github.com/Jinxiaolong1129/Foot-in-the-door-Jailbreak .