Leveraging Large Language Models for Cybersecurity: Enhancing SMS Spam Detection with Robust and Context-Aware Text Classification

📅 2025-02-16

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This study addresses the problem of SMS spam detection by systematically evaluating the performance of six classifiers—including Naive Bayes, SVM, and DNN—combined with two text representation schemes: Bag-of-Words (BoW) and TF-IDF. Methodologically, it conducts the first comprehensive, multi-model–feature coupling benchmark on this task. Results demonstrate that TF-IDF consistently enhances classification performance across all models, confirming its universal utility. Specifically, Naive Bayes with TF-IDF achieves the best accuracy–efficiency trade-off (96.2% overall accuracy; precision of 0.976 for ham and 0.754 for spam); SVM with TF-IDF attains 94.5% accuracy; and DNN with TF-IDF yields the highest recall (0.991). The work contributes a reproducible, lightweight, and robust baseline framework for SMS anti-spam systems, along with principled guidance on model–feature co-design.

Technology Category

Application Category

📝 Abstract

This study evaluates the effectiveness of different feature extraction techniques and classification algorithms in detecting spam messages within SMS data. We analyzed six classifiers Naive Bayes, K-Nearest Neighbors, Support Vector Machines, Linear Discriminant Analysis, Decision Trees, and Deep Neural Networks using two feature extraction methods: bag-of-words and TF-IDF. The primary objective was to determine the most effective classifier-feature combination for SMS spam detection. Our research offers two main contributions: first, by systematically examining various classifier and feature extraction pairings, and second, by empirically evaluating their ability to distinguish spam messages. Our results demonstrate that the TF-IDF method consistently outperforms the bag-of-words approach across all six classifiers. Specifically, Naive Bayes with TF-IDF achieved the highest accuracy of 96.2%, with a precision of 0.976 for non-spam and 0.754 for spam messages. Similarly, Support Vector Machines with TF-IDF exhibited an accuracy of 94.5%, with a precision of 0.926 for non-spam and 0.891 for spam. Deep Neural Networks using TF-IDF yielded an accuracy of 91.0%, with a recall of 0.991 for non-spam and 0.415 for spam messages. In contrast, classifiers such as K-Nearest Neighbors, Linear Discriminant Analysis, and Decision Trees showed weaker performance, regardless of the feature extraction method employed. Furthermore, we observed substantial variability in classifier effectiveness depending on the chosen feature extraction technique. Our findings emphasize the significance of feature selection in SMS spam detection and suggest that TF-IDF, when paired with Naive Bayes, Support Vector Machines, or Deep Neural Networks, provides the most reliable performance. These insights provide a foundation for improving SMS spam detection through optimized feature extraction and classification methods.

Problem

Research questions and friction points this paper is trying to address.

Evaluating classifiers for SMS spam detection

Comparing bag-of-words and TF-IDF feature extraction

Identifying optimal classifier-feature extraction combinations

Innovation

Methods, ideas, or system contributions that make the work stand out.

TF-IDF with Naive Bayes

Support Vector Machines pairing

Deep Neural Networks optimization

🔎 Similar Papers

No similar papers found.