π€ AI Summary
This study addresses the long-standing scarcity of large-scale, real-world anomaly detection benchmark datasets in safety-critical domains such as transportation. To this end, we introduce EngineADβthe first multivariate engine anomaly detection dataset comprising six months of high-resolution sensor telemetry from 25 commercial vehicles. The data have been expert-annotated and preprocessed into 300-timestep segments represented by eight principal components. Using this dataset, we systematically evaluate nine one-class anomaly detection methods, including K-Means, One-Class SVM, and deep learning models. Our results reveal that traditional approaches often outperform deep models and exhibit substantial variability in cross-vehicle generalization. This work establishes a reliable benchmark and provides empirical insights for industrial-scale early fault detection.
π Abstract
The progress of Anomaly Detection (AD) in safety-critical domains, such as transportation, is severely constrained by the lack of large-scale, real-world benchmarks. To address this, we introduce EngineAD, a novel, multivariate dataset comprising high-resolution sensor telemetry collected from a fleet of 25 commercial vehicles over a six-month period. Unlike synthetic datasets, EngineAD features authentic operational data labeled with expert annotations, distinguishing normal states from subtle indicators of incipient engine faults. We preprocess the data into $300$-timestep segments of $8$ principal components and establish an initial benchmark using nine diverse one-class anomaly detection models. Our experiments reveal significant performance variability across the vehicle fleet, underscoring the challenge of cross-vehicle generalization. Furthermore, our findings corroborate recent literature, showing that simple classical methods (e.g., K-Means and One-Class SVM) are often highly competitive with, or superior to, deep learning approaches in this segment-based evaluation. By publicly releasing EngineAD, we aim to provide a realistic, challenging resource for developing robust and field-deployable anomaly detection and anomaly prediction solutions for the automotive industry.