mTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at Scale

📅 2025-06-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multivariate time-series anomaly detection (MTS-AD) suffers from complex inter-variable dependencies, strong temporal dynamics, and severe label scarcity, leading to a long-standing absence of standardized benchmarks—hindering fair method evaluation and model selection. To address this, we introduce the largest publicly available MTS-AD and unsupervised model selection benchmark to date: it encompasses 19 datasets, 344 labeled sequences, and 12 application domains; supports 24 detectors—including the first systematic evaluation of LLM-based methods; and provides a unified preprocessing pipeline, an LLM-driven anomaly scoring mechanism, a standardized evaluation protocol, and multi-dimensional metrics (F1, AUC, latency). Key findings reveal that “no universal detector” holds empirically, and the best model selection strategy achieves only 63.2% of oracle performance on average. All data, code, and tools are fully open-sourced to foster reproducible, equitable research.

Technology Category

Application Category

📝 Abstract
Multivariate time series anomaly detection (MTS-AD) is critical in domains like healthcare, cybersecurity, and industrial monitoring, yet remains challenging due to complex inter-variable dependencies, temporal dynamics, and sparse anomaly labels. We introduce mTSBench, the largest benchmark to date for MTS-AD and unsupervised model selection, spanning 344 labeled time series across 19 datasets and 12 diverse application domains. mTSBench evaluates 24 anomaly detection methods, including large language model (LLM)-based detectors for multivariate time series, and systematically benchmarks unsupervised model selection techniques under standardized conditions. Consistent with prior findings, our results confirm that no single detector excels across datasets, underscoring the importance of model selection. However, even state-of-the-art selection methods remain far from optimal, revealing critical gaps. mTSBench provides a unified evaluation suite to enable rigorous, reproducible comparisons and catalyze future advances in adaptive anomaly detection and robust model selection.
Problem

Research questions and friction points this paper is trying to address.

Benchmarking multivariate time series anomaly detection methods
Evaluating unsupervised model selection techniques
Addressing gaps in adaptive anomaly detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Largest benchmark for multivariate anomaly detection
Evaluates 24 methods including LLM-based detectors
Unified suite for reproducible model comparisons
🔎 Similar Papers
No similar papers found.