🤖 AI Summary
This paper addresses the insufficient evaluation of embedding-based text anomaly detection (e.g., spam, misinformation, profanity identification) by introducing TAD-Bench—the first comprehensive benchmark for this task. It systematically decouples and jointly evaluates the synergy between diverse text embeddings (BERT, RoBERTa, Sentence-BERT, and LLM-derived embeddings) and classical/deep anomaly detection algorithms (Isolation Forest, OC-SVM, DeepSVDD, GOAD). TAD-Bench covers cross-domain, multi-granularity real-world datasets and provides a unified framework to assess embedding–algorithm coupling effects, revealing principled alignment patterns between embedding characteristics and task granularity. Experiments demonstrate that higher-quality embeddings do not necessarily yield better detection performance; optimal embedding–algorithm combinations improve F1 scores by up to 12.7% on fine-grained tasks. The project releases an open-source, reproducible framework and benchmark suite, establishing a standardized evaluation foundation for text anomaly detection.
📝 Abstract
Text anomaly detection is crucial for identifying spam, misinformation, and offensive language in natural language processing tasks. Despite the growing adoption of embedding-based methods, their effectiveness and generalizability across diverse application scenarios remain under-explored. To address this, we present TAD-Bench, a comprehensive benchmark designed to systematically evaluate embedding-based approaches for text anomaly detection. TAD-Bench integrates multiple datasets spanning different domains, combining state-of-the-art embeddings from large language models with a variety of anomaly detection algorithms. Through extensive experiments, we analyze the interplay between embeddings and detection methods, uncovering their strengths, weaknesses, and applicability to different tasks. These findings offer new perspectives on building more robust, efficient, and generalizable anomaly detection systems for real-world applications.