🤖 AI Summary
This work addresses the insufficient robustness of deep learning–based spam filters against adversarial attacks. We systematically evaluate six mainstream models—including LSTM and BERT—under word-level, character-level, sentence-level, and AI-generated paragraph-level attacks. To enhance attack efficacy, we propose a dual-weight scoring mechanism integrating domain-informed spam weights and attention weights, enabling targeted multi-granularity adversarial example generation. Our attack framework unifies gradient-based, substitution-based, and generative strategies to induce cross-granularity perturbations. Experimental results show that all evaluated attacks achieve an average success rate exceeding 78% across models; notably, we identify, for the first time, critical failure modes under semantically invariant perturbations. The study delivers a reproducible evaluation framework and empirical evidence to guide robustness enhancement in spam detection systems.
📝 Abstract
Deep learning has revolutionized email filtering, which is critical to protect users from cyber threats such as spam, malware, and phishing. However, the increasing sophistication of adversarial attacks poses a significant challenge to the effectiveness of these filters. This study investigates the impact of adversarial attacks on deep learning-based spam detection systems using real-world datasets. Six prominent deep learning models are evaluated on these datasets, analyzing attacks at the word, character sentence, and AI-generated paragraph-levels. Novel scoring functions, including spam weights and attention weights, are introduced to improve attack effectiveness. This comprehensive analysis sheds light on the vulnerabilities of spam filters and contributes to efforts to improve their security against evolving adversarial threats.