User-Centric Phishing Detection: A RAG and LLM-Based Approach

📅 2026-01-29

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work proposes a personalized phishing email detection framework that integrates retrieval-augmented generation (RAG) with large language models (LLMs), addressing the limitations of traditional methods and the high false positive rates of standalone LLM-based classifiers. By uniquely combining user profiling with RAG, the approach retrieves a user’s historical legitimate emails and fuses them with real-time threat intelligence to construct an individualized context, thereby guiding the LLM toward more accurate decisions. Leveraging open-source LLMs such as Llama4-Scout and DeepSeek-R1 alongside multi-source threat intelligence platforms, the method achieves an F1 score of 0.9703 on real-world email datasets and reduces false positives by 66.7%, significantly enhancing both detection accuracy and practical applicability.

Technology Category

Application Category

📝 Abstract

The escalating sophistication of phishing emails necessitates a shift beyond traditional rule-based and conventional machine-learning-based detectors. Although large language models (LLMs) offer strong natural language understanding, using them as standalone classifiers often yields elevated falsepositive (FP) rates, which mislabel legitimate emails as phishing and create significant operational burden. This paper presents a personalized phishing detection framework that integrates LLMs with retrieval-augmented generation (RAG). For each message, the system constructs user-specific context by retrieving a compact set of the user's historical legitimate emails and enriching it with real-time domain and URL reputation from a cyber-threat intelligence platform, then conditions the LLM's decision on this evidence. We evaluate four open-source LLMs (Llama4-Scout, DeepSeek-R1, Mistral-Saba, and Gemma2) on an email dataset collected from public and institutional sources. Results show high performance; for example, Llama4-Scout attains an F1-score of 0.9703 and achieves a 66.7% reduction in FPs with RAG. These findings validate that a RAG-based, user-profiling approach is both feasible and effective for building high-precision, low-friction email security systems that adapt to individual communication patterns.

Problem

Research questions and friction points this paper is trying to address.

phishing detection

false positives

large language models

email security

user-centric

Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-Augmented Generation (RAG)

Large Language Models (LLMs)

Personalized Phishing Detection

User-Centric Security

False Positive Reduction

🔎 Similar Papers

No similar papers found.

Authors to Follow