🤖 AI Summary
To address key challenges in real-time client-side phishing detection—including reliance on external blacklists, privacy leakage, and difficulty identifying zero-day attacks—this paper proposes the first fully localized, low-latency end-to-end detection framework. Methodologically, it introduces a dual-channel lightweight MobileBERT model processing both URL strings and webpage source code, enabling context-aware offline analysis via multimodal semantic fusion; the entire pipeline involves no network requests or data uploads, achieving end-to-end latency under 300 ms. Key contributions include: (1) the first completely offline, client-side multimodal integration architecture; (2) breakthrough zero-day phishing detection capability, overcoming limitations of heuristic and cloud-based approaches; and (3) empirical validation within a real browser environment (as a Chromium extension), demonstrating significantly higher accuracy than state-of-the-art anti-phishing tools while preserving privacy, real-time responsiveness, and generalization ability.
📝 Abstract
In this paper, we introduce PhishLang, the first fully client-side anti-phishing framework built on a lightweight ensemble framework that utilizes advanced language models to analyze the contextual features of a website's source code and URL. Unlike traditional heuristic or machine learning approaches that rely on static features and struggle to adapt to evolving threats, or deep learning models that are computationally intensive, our approach utilizes MobileBERT, a fast and memory-efficient variant of the BERT architecture, to capture nuanced features indicative of phishing attacks. To further enhance detection accuracy, PhishLang employs a multi-modal ensemble approach, combining both the URL and Source detection models. This architecture ensures robustness by allowing one model to compensate for scenarios where the other may fail, or if both models provide ambiguous inferences. As a result, PhishLang excels at detecting both regular and evasive phishing threats, including zero-day attacks, outperforming popular anti-phishing tools, while operating without relying on external blocklists and safeguarding user privacy by ensuring that browser history remains entirely local and unshared. We release PhishLang as a Chromium browser extension and also open-source the framework to aid the research community.