🤖 AI Summary
Existing QR code phishing (Quishing) detection methods rely on URL extraction, exposing users to privacy risks and failing to detect non-URL QR codes (e.g., Wi-Fi configuration, payment codes).
Method: This paper proposes the first purely vision-based malicious QR code detection framework—bypassing content parsing entirely and directly modeling structural and pixel-level patterns. We design a lightweight structural feature engineering pipeline and integrate it with interpretable tree-based models (e.g., XGBoost) for end-to-end classification. Crucially, we empirically establish a strong correlation between structural anomalies and phishing risk, and leverage feature importance to guide pixel-level pruning, enhancing robustness and inference efficiency.
Contribution/Results: Our XGBoost model achieves an AUC of 0.9133. We construct and publicly release the first fully annotated dataset of phishing and benign QR codes. Experiments validate the feasibility and effectiveness of a QR-centric, content-agnostic detection paradigm.
📝 Abstract
The rise of QR code based phishing ("Quishing") poses a growing cybersecurity threat, as attackers increasingly exploit QR codes to bypass traditional phishing defenses. Existing detection methods predominantly focus on URL analysis, which requires the extraction of the QR code payload, and may inadvertently expose users to malicious content. Moreover, QR codes can encode various types of data beyond URLs, such as Wi-Fi credentials and payment information, making URL-based detection insufficient for broader security concerns. To address these gaps, we propose the first framework for quishing detection that directly analyzes QR code structure and pixel patterns without extracting the embedded content. We generated a dataset of phishing and benign QR codes and we used it to train and evaluate multiple machine learning models, including Logistic Regression, Decision Trees, Random Forest, Naive Bayes, LightGBM, and XGBoost. Our best-performing model (XGBoost) achieves an AUC of 0.9106, demonstrating the feasibility of QR-centric detection. Through feature importance analysis, we identify key visual indicators of malicious intent and refine our feature set by removing non-informative pixels, improving performance to an AUC of 0.9133 with a reduced feature space. Our findings reveal that the structural features of QR code correlate strongly with phishing risk. This work establishes a foundation for quishing mitigation and highlights the potential of direct QR analysis as a critical layer in modern phishing defenses.