🤖 AI Summary
This work addresses the challenge of effectively detecting unknown anomalies in autonomous driving systems under the absence of anomaly labels. To this end, we propose a self-supervised online anomaly detection framework based on the Joint Embedding Predictive Architecture (JEPA), which, to the best of our knowledge, is the first to apply JEPA to temporal object state data in autonomous driving. By learning latent representations from unlabeled data, the method enables effective identification of previously unseen anomalies. Integrating classical anomaly detection techniques, our approach demonstrates high detection performance for object state anomalies in unsupervised settings, as validated on the real-world nuScenes dataset. This significantly enhances the online robustness of autonomous driving systems against unexpected and anomalous scenarios.
📝 Abstract
As autonomous vehicles are rolled out, measures must be taken to ensure their safe operation. In order to supervise a system that is already in operation, monitoring frameworks are frequently employed. These run continuously online in the background, supervising the system status and recording anomalies. This work proposes an online monitoring framework to detect anomalies in object state representations. Thereby, a key challenge is creating a framework for anomaly detection without anomaly labels, which are usually unavailable for unknown anomalies. To address this issue, this work applies a self-supervised embedding method to translate object data into a latent representation space. For this, a JEPA-based self-supervised prediction task is constructed, allowing training without anomaly labels and the creation of rich object embeddings. The resulting expressive JEPA embeddings serve as input for established anomaly detection methods, in order to identify anomalies within object state representations. This framework is particularly useful for applications in real-world environments, where new or unknown anomalies may occur during operation for which there are no labels available. Experiments performed on the publicly available, real-world nuScenes dataset illustrate the framework's capabilities.