Phishing Detection in the Gen-AI Era: Quantized LLMs vs Classical Models

📅 2025-07-10

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Phishing attacks are growing increasingly sophisticated, necessitating detection methods that simultaneously achieve high accuracy, low computational overhead, and model interpretability. This paper systematically evaluates the performance trade-offs of traditional machine learning, deep learning, and quantized lightweight large language models (e.g., Qwen-14B Q8_0) for phishing email detection, with emphasis on zero-shot and few-shot prompting capabilities and decision explainability. We propose a novel LLM-based rewriting robustness evaluation paradigm to expose vulnerabilities of existing detectors to LLM-generated phishing content. Experimental results demonstrate that the quantized LLM achieves >80% accuracy using only 17 GB GPU memory, while exhibiting superior adversarial robustness, sub-second inference latency, and real-time attribution—outperforming conventional approaches. Our work establishes a viable pathway toward deployable, resource-efficient AI security systems for constrained environments.

Technology Category

Application Category

📝 Abstract

Phishing attacks are becoming increasingly sophisticated, underscoring the need for detection systems that strike a balance between high accuracy and computational efficiency. This paper presents a comparative evaluation of traditional Machine Learning (ML), Deep Learning (DL), and quantized small-parameter Large Language Models (LLMs) for phishing detection. Through experiments on a curated dataset, we show that while LLMs currently underperform compared to ML and DL methods in terms of raw accuracy, they exhibit strong potential for identifying subtle, context-based phishing cues. We also investigate the impact of zero-shot and few-shot prompting strategies, revealing that LLM-rephrased emails can significantly degrade the performance of both ML and LLM-based detectors. Our benchmarking highlights that models like DeepSeek R1 Distill Qwen 14B (Q8_0) achieve competitive accuracy, above 80%, using only 17GB of VRAM, supporting their viability for cost-efficient deployment. We further assess the models' adversarial robustness and cost-performance tradeoffs, and demonstrate how lightweight LLMs can provide concise, interpretable explanations to support real-time decision-making. These findings position optimized LLMs as promising components in phishing defence systems and offer a path forward for integrating explainable, efficient AI into modern cybersecurity frameworks.

Problem

Research questions and friction points this paper is trying to address.

Compare phishing detection accuracy of ML, DL, and quantized LLMs

Evaluate LLMs' potential for identifying context-based phishing cues

Assess cost-performance tradeoffs and adversarial robustness in detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantized small-parameter LLMs for phishing detection

Zero-shot and few-shot prompting strategies evaluation

Lightweight LLMs with interpretable real-time explanations

🔎 Similar Papers

No similar papers found.

Authors to Follow