Towards Token-Level Text Anomaly Detection

📅 2026-01-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitation of existing text anomaly detection methods, which are predominantly confined to the document level and thus unable to precisely localize anomalous segments. To advance fine-grained anomaly detection, we introduce the first token-level text anomaly detection task and construct three benchmark datasets with fine-grained annotations. We propose a unified multi-granularity detection framework that integrates deep learning and natural language processing techniques to jointly support both document-level and token-level anomaly identification. Experimental results demonstrate that our approach significantly outperforms six baseline models on the newly curated datasets. The code and datasets are publicly released to foster further research in fine-grained text anomaly detection.

Technology Category

Application Category

📝 Abstract
Despite significant progress in text anomaly detection for web applications such as spam filtering and fake news detection, existing methods are fundamentally limited to document-level analysis, unable to identify which specific parts of a text are anomalous. We introduce token-level anomaly detection, a novel paradigm that enables fine-grained localization of anomalies within text. We formally define text anomalies at both document and token-levels, and propose a unified detection framework that operates across multiple levels. To facilitate research in this direction, we collect and annotate three benchmark datasets spanning spam, reviews and grammar errors with token-level labels. Experimental results demonstrate that our framework get better performance than other 6 baselines, opening new possibilities for precise anomaly localization in text. All the codes and data are publicly available on https://github.com/charles-cao/TokenCore.
Problem

Research questions and friction points this paper is trying to address.

text anomaly detection
token-level
anomaly localization
fine-grained detection
web applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

token-level anomaly detection
fine-grained localization
text anomaly detection
multi-level detection framework
annotated benchmark datasets
🔎 Similar Papers
No similar papers found.
Y
Yang Cao
Great Bay University, Tsinghua University
B
Bicheng Yu
Great Bay University, Shenzhen University
S
Sikun Yang
Great Bay University, Dongguan Key Laboratory for AI and Dynamical Systems
Ming Liu
Ming Liu
Senior Lecturer of Machine Learning, Deakin University
Natural Language ProcessingMachine LearningHuman-centered AI
Yujiu Yang
Yujiu Yang
SIGS, Tsinghua University
Machine Learning, Nature language processing, Computer vision