PhishLang: A Real-Time, Fully Client-Side Phishing Detection Framework Using MobileBERT

📅 2024-08-11

📈 Citations: 3

✨ Influential: 0

career value

198K/year

🤖 AI Summary

To address key challenges in real-time client-side phishing detection—including reliance on external blacklists, privacy leakage, and difficulty identifying zero-day attacks—this paper proposes the first fully localized, low-latency end-to-end detection framework. Methodologically, it introduces a dual-channel lightweight MobileBERT model processing both URL strings and webpage source code, enabling context-aware offline analysis via multimodal semantic fusion; the entire pipeline involves no network requests or data uploads, achieving end-to-end latency under 300 ms. Key contributions include: (1) the first completely offline, client-side multimodal integration architecture; (2) breakthrough zero-day phishing detection capability, overcoming limitations of heuristic and cloud-based approaches; and (3) empirical validation within a real browser environment (as a Chromium extension), demonstrating significantly higher accuracy than state-of-the-art anti-phishing tools while preserving privacy, real-time responsiveness, and generalization ability.

Technology Category

Application Category

📝 Abstract

In this paper, we introduce PhishLang, the first fully client-side anti-phishing framework built on a lightweight ensemble framework that utilizes advanced language models to analyze the contextual features of a website's source code and URL. Unlike traditional heuristic or machine learning approaches that rely on static features and struggle to adapt to evolving threats, or deep learning models that are computationally intensive, our approach utilizes MobileBERT, a fast and memory-efficient variant of the BERT architecture, to capture nuanced features indicative of phishing attacks. To further enhance detection accuracy, PhishLang employs a multi-modal ensemble approach, combining both the URL and Source detection models. This architecture ensures robustness by allowing one model to compensate for scenarios where the other may fail, or if both models provide ambiguous inferences. As a result, PhishLang excels at detecting both regular and evasive phishing threats, including zero-day attacks, outperforming popular anti-phishing tools, while operating without relying on external blocklists and safeguarding user privacy by ensuring that browser history remains entirely local and unshared. We release PhishLang as a Chromium browser extension and also open-source the framework to aid the research community.

Problem

Research questions and friction points this paper is trying to address.

Detects phishing websites using client-side analysis

Combines URL and source code features for accuracy

Ensures privacy with local processing, no blocklists

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses MobileBERT for efficient phishing detection

Combines URL and source code multi-modal ensemble

Operates fully client-side ensuring user privacy

🔎 Similar Papers

PhishAgent: A Robust Multimodal Agent for Phishing Webpage Detection