🤖 AI Summary
For two-dimensional outlier detection, exact algorithms suffer from high computational overhead, while existing heuristics—such as distance-based or iterative convex-hull peeling—exhibit weak discriminative power. To address this, we propose an area-weighted iterative convex-hull peeling algorithm. Its core innovation is the first adoption of **convex hull area minimization** as the greedy peeling criterion: in each iteration, the boundary point whose removal induces the largest reduction in the current convex hull’s area is eliminated, thereby efficiently identifying the *k* most “extroverted” outliers. This strategy offers both geometric interpretability and a global optimization bias, and naturally generalizes to other geometric objectives (e.g., perimeter minimization). The algorithm achieves *O(n log n)* time complexity and *O(n)* space complexity—significantly outperforming exact methods (e.g., Eppstein’s *O(n² log n + (n−k)³)*) and non-area-based heuristics.
📝 Abstract
We present a novel 2D convex hull peeling algorithm for outlier detection, which repeatedly removes the point on the hull that decreases the hull's area the most. To find k outliers among n points, one simply peels k points. The algorithm is an efficient heuristic for exact methods, which find the k points whose removal together results in the smallest convex hull. Our algorithm runs in O(nlogn) time using O(n) space for any choice of k. This is a significant speedup compared to the fastest exact algorithms, which run in O(n^2logn + (n - k)^3) time using O(nlogn + (n-k)^3) space by Eppstein et al., and O(nlogn + 4k_C_2k (3k)^k n) time by Atanassov et al. Existing heuristic peeling approaches are not area-based. Instead, an approach by Harsh et al. repeatedly removes the point furthest from the mean using various distance metrics and runs in O(nlogn + kn) time. Other approaches greedily peel one convex layer at a time, which is efficient when using an O(nlogn) time algorithm by Chazelle to compute the convex layers. However, in many cases this fails to recover outliers. For most values of n and k, our approach is the fastest and first practical choice for finding outliers based on minimizing the area of the convex hull. Our algorithm also generalizes to other objectives such as perimeter.