Exploiting Jailbreaking Vulnerabilities in Generative AI to Bypass Ethical Safeguards for Facilitating Phishing Attacks

📅 2025-07-16

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study investigates the risk of generative AI models (e.g., ChatGPT-4o Mini, DeepSeek) being maliciously exploited for phishing attacks under jailbreaking conditions. Method: Employing an ethically governed experimental framework, we systematically identify vulnerabilities in model safety guardrails and evaluate their susceptibility to generating multimodal phishing artifacts—including deceptive emails, SMS messages, and voice scripts—as well as recommending offensive tools and orchestrating end-to-end attack workflows. Results: Human-guided, AI-assisted phishing exhibits significantly enhanced stealth and evades conventional detection systems. This work provides the first empirical evidence of large language models’ amplification effect on social engineering attacks post-jailbreak. Accordingly, we propose a novel three-tiered defense paradigm integrating user awareness training, robust identity authentication, and adaptive regulatory oversight—offering actionable technical pathways and policy-relevant insights for AI security governance.

Technology Category

Application Category

📝 Abstract

The advent of advanced Generative AI (GenAI) models such as DeepSeek and ChatGPT has significantly reshaped the cybersecurity landscape, introducing both promising opportunities and critical risks. This study investigates how GenAI powered chatbot services can be exploited via jailbreaking techniques to bypass ethical safeguards, enabling the generation of phishing content, recommendation of hacking tools, and orchestration of phishing campaigns. In ethically controlled experiments, we used ChatGPT 4o Mini selected for its accessibility and status as the latest publicly available model at the time of experimentation, as a representative GenAI system. Our findings reveal that the model could successfully guide novice users in executing phishing attacks across various vectors, including web, email, SMS (smishing), and voice (vishing). Unlike automated phishing campaigns that typically follow detectable patterns, these human-guided, AI assisted attacks are capable of evading traditional anti phishing mechanisms, thereby posing a growing security threat. We focused on DeepSeek and ChatGPT due to their widespread adoption and technical relevance in 2025. The study further examines common jailbreaking techniques and the specific vulnerabilities exploited in these models. Finally, we evaluate a range of mitigation strategies such as user education, advanced authentication mechanisms, and regulatory policy measures and discuss emerging trends in GenAI facilitated phishing, outlining future research directions to strengthen cybersecurity defenses in the age of artificial intelligence.

Problem

Research questions and friction points this paper is trying to address.

Exploiting GenAI jailbreaking to bypass ethical safeguards

Generating phishing content via AI-assisted human guidance

Evading traditional anti-phishing mechanisms with AI

Innovation

Methods, ideas, or system contributions that make the work stand out.

Jailbreaking GenAI to bypass ethical safeguards

Human-guided AI phishing evades traditional defenses

Evaluating mitigation strategies for GenAI phishing

🔎 Similar Papers

No similar papers found.

Authors to Follow