🤖 AI Summary
This work proposes a unified framework for anomaly detection based on surprisal, addressing the limitations of traditional methods that rely on ad hoc rules or strong modeling assumptions and often fail to identify “inlier” anomalies in low-density regions. The approach defines anomaly scores as the surprisal of observations under a possibly misspecified model and quantifies anomaly severity via upper-tail probabilities. By reducing high-dimensional anomaly detection to a univariate tail estimation problem, the method combines the empirical distribution with the Generalized Pareto Distribution (GPD) to estimate tail probabilities and leverages the Dvoretzky–Kiefer–Wolfowitz inequality to provide finite-sample confidence guarantees. Experiments demonstrate that the method robustly detects both tail and inlier anomalies even under substantial model misspecification, achieving strong performance on synthetic data as well as real-world datasets including French mortality records and cricket test matches.
📝 Abstract
Anomaly detection methods are widely used but often rely on ad hoc rules or strong assumptions, and they often focus on tail events, missing ``inlier'' anomalies that occur in low-density gaps between modes. We propose a unified framework that defines an anomaly as an observation with unusually low probability under a (possibly misspecified) model. For each observation we compute its surprisal (the negative log generalized density) and define an anomaly score as the probability of a surprisal at least as large as that observed. This reduces anomaly detection for complex univariate or multivariate data to estimating the upper tail of a univariate surprisal distribution. We develop two model-robust estimators of these tail probabilities: an empirical estimator based on the observed surprisal distribution and an extreme-value estimator that fits a Generalized Pareto Distribution above a high threshold. For the empirical method we give conditions under which tail ordering is preserved and derive finite-sample confidence guarantees via the Dvoretzky--Kiefer--Wolfowitz inequality. For the GPD method we establish broad tail conditions ensuring classical extreme-value behavior. Simulations and applications to French mortality and Test-cricket data show the approach remains effective under substantial model misspecification.