🤖 AI Summary
Phishing attacks are growing increasingly sophisticated, necessitating detection methods that simultaneously achieve high accuracy, low computational overhead, and model interpretability. This paper systematically evaluates the performance trade-offs of traditional machine learning, deep learning, and quantized lightweight large language models (e.g., Qwen-14B Q8_0) for phishing email detection, with emphasis on zero-shot and few-shot prompting capabilities and decision explainability. We propose a novel LLM-based rewriting robustness evaluation paradigm to expose vulnerabilities of existing detectors to LLM-generated phishing content. Experimental results demonstrate that the quantized LLM achieves >80% accuracy using only 17 GB GPU memory, while exhibiting superior adversarial robustness, sub-second inference latency, and real-time attribution—outperforming conventional approaches. Our work establishes a viable pathway toward deployable, resource-efficient AI security systems for constrained environments.
📝 Abstract
Phishing attacks are becoming increasingly sophisticated, underscoring the need for detection systems that strike a balance between high accuracy and computational efficiency. This paper presents a comparative evaluation of traditional Machine Learning (ML), Deep Learning (DL), and quantized small-parameter Large Language Models (LLMs) for phishing detection. Through experiments on a curated dataset, we show that while LLMs currently underperform compared to ML and DL methods in terms of raw accuracy, they exhibit strong potential for identifying subtle, context-based phishing cues. We also investigate the impact of zero-shot and few-shot prompting strategies, revealing that LLM-rephrased emails can significantly degrade the performance of both ML and LLM-based detectors. Our benchmarking highlights that models like DeepSeek R1 Distill Qwen 14B (Q8_0) achieve competitive accuracy, above 80%, using only 17GB of VRAM, supporting their viability for cost-efficient deployment. We further assess the models' adversarial robustness and cost-performance tradeoffs, and demonstrate how lightweight LLMs can provide concise, interpretable explanations to support real-time decision-making. These findings position optimized LLMs as promising components in phishing defence systems and offer a path forward for integrating explainable, efficient AI into modern cybersecurity frameworks.