🤖 AI Summary
Detecting joint shape and magnitude anomalies in functional data remains challenging. Method: This paper proposes the Area-Based Ordering Index (ABEI/ABHI), the first to incorporate inter-curve area differences into the ordering mechanism, thereby enabling simultaneous sensitivity to both shape shifts and magnitude deviations. Building upon this, we develop EHyOut—a method that reformulates functional anomaly detection as a multivariate outlier identification problem using first- and second-order derivative feature vectors, integrated with robust estimators such as FastMCD. Contribution/Results: Unlike conventional Magnitude-based Ordering Indices (MEI/MHI), which are insensitive to magnitude anomalies, EHyOut overcomes this limitation. Extensive simulations under diverse contamination scenarios demonstrate its superior performance. Empirical validation on Spanish meteorological and UN population datasets confirms its strong robustness, interpretability, and practical utility.
📝 Abstract
Detecting outliers in Functional Data Analysis is challenging because curves can stray from the majority in many different ways. The Modified Epigraph Index (MEI) and Modified Hypograph Index (MHI) rank functions by the fraction of the domain on which one curve lies above or below another. While effective for spotting shape anomalies, their construction limits their ability to flag magnitude outliers. This paper introduces two new metrics, the Area-Based Epigraph Index (ABEI) and Area-Based Hypograph Index (ABHI) that quantify the area between curves, enabling simultaneous sensitivity to both magnitude and shape deviations. Building on these indices, we present EHyOut, a robust procedure that recasts functional outlier detection as a multivariate problem: for every curve, and for its first and second derivatives, we compute ABEI and ABHI and then apply multivariate outlier-detection techniques to the resulting feature vectors. Extensive simulations show that EHyOut remains stable across a wide range of contamination settings and often outperforms established benchmark methods. Moreover, applications to Spanish weather data and United Nations world population data further illustrate the practical utility and meaningfulness of this methodology.