🤖 AI Summary
This study addresses the challenge of rapidly detecting fake news propagation on online platforms by designing a lightweight, efficient detection system. We comparatively evaluate shallow models—SVM combined with BoW, TF-IDF, and Word2Vec embeddings—against fine-tuned BERT-base on a binary news veracity classification task. Experimental results show that BoW+SVM achieves 99.81% accuracy and an F1-score of 0.9980, closely approaching BERT-base’s 99.98% accuracy and 0.9998 F1-score, while drastically reducing inference latency and computational overhead. These findings empirically demonstrate that carefully engineered traditional models remain highly competitive in low-resource, latency-critical scenarios—challenging the prevailing assumption that larger models inherently outperform smaller ones. The work provides both theoretical insight and practical deployment guidance for real-world fake news detection systems operating under strict computational constraints.
📝 Abstract
The rapid spread of misinformation, particularly through online platforms, underscores the urgent need for reliable detection systems. This study explores the utilization of machine learning and natural language processing, specifically Support Vector Machines (SVM) and BERT, to detect news that are fake. We employ three distinct text vectorization methods for SVM: Term Frequency Inverse Document Frequency (TF-IDF), Word2Vec, and Bag of Words (BoW) evaluating their effectiveness in distinguishing between genuine and fake news. Additionally, we compare these methods against the transformer large language model, BERT. Our comprehensive approach includes detailed preprocessing steps, rigorous model implementation, and thorough evaluation to determine the most effective techniques. The results demonstrate that while BERT achieves superior accuracy with 99.98% and an F1-score of 0.9998, the SVM model with a linear kernel and BoW vectorization also performs exceptionally well, achieving 99.81% accuracy and an F1-score of 0.9980. These findings highlight that, despite BERT's superior performance, SVM models with BoW and TF-IDF vectorization methods come remarkably close, offering highly competitive performance with the advantage of lower computational requirements.