Phishing Detection in the Gen-AI Era: Quantized LLMs vs Classical Models

📅 2025-07-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Phishing attacks are growing increasingly sophisticated, necessitating detection methods that simultaneously achieve high accuracy, low computational overhead, and model interpretability. This paper systematically evaluates the performance trade-offs of traditional machine learning, deep learning, and quantized lightweight large language models (e.g., Qwen-14B Q8_0) for phishing email detection, with emphasis on zero-shot and few-shot prompting capabilities and decision explainability. We propose a novel LLM-based rewriting robustness evaluation paradigm to expose vulnerabilities of existing detectors to LLM-generated phishing content. Experimental results demonstrate that the quantized LLM achieves >80% accuracy using only 17 GB GPU memory, while exhibiting superior adversarial robustness, sub-second inference latency, and real-time attribution—outperforming conventional approaches. Our work establishes a viable pathway toward deployable, resource-efficient AI security systems for constrained environments.

Technology Category

Application Category

📝 Abstract
Phishing attacks are becoming increasingly sophisticated, underscoring the need for detection systems that strike a balance between high accuracy and computational efficiency. This paper presents a comparative evaluation of traditional Machine Learning (ML), Deep Learning (DL), and quantized small-parameter Large Language Models (LLMs) for phishing detection. Through experiments on a curated dataset, we show that while LLMs currently underperform compared to ML and DL methods in terms of raw accuracy, they exhibit strong potential for identifying subtle, context-based phishing cues. We also investigate the impact of zero-shot and few-shot prompting strategies, revealing that LLM-rephrased emails can significantly degrade the performance of both ML and LLM-based detectors. Our benchmarking highlights that models like DeepSeek R1 Distill Qwen 14B (Q8_0) achieve competitive accuracy, above 80%, using only 17GB of VRAM, supporting their viability for cost-efficient deployment. We further assess the models' adversarial robustness and cost-performance tradeoffs, and demonstrate how lightweight LLMs can provide concise, interpretable explanations to support real-time decision-making. These findings position optimized LLMs as promising components in phishing defence systems and offer a path forward for integrating explainable, efficient AI into modern cybersecurity frameworks.
Problem

Research questions and friction points this paper is trying to address.

Compare phishing detection accuracy of ML, DL, and quantized LLMs
Evaluate LLMs' potential for identifying context-based phishing cues
Assess cost-performance tradeoffs and adversarial robustness in detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantized small-parameter LLMs for phishing detection
Zero-shot and few-shot prompting strategies evaluation
Lightweight LLMs with interpretable real-time explanations
🔎 Similar Papers
No similar papers found.
J
Jikesh Thapa
School of Computer Science and Technology, Algoma University, Canada
G
Gurrehmat Chahal
School of Computer Science and Technology, Algoma University, Canada
S
Serban Voinea Gabreanu
School of Computer Science and Technology, Algoma University, Canada
Yazan Otoum
Yazan Otoum
University of Ottawa
AIoTCybersecurityFederated Learning