Clustering Tails in High Dimension

📅 2025-06-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
High-dimensional extreme-value analysis is hindered by sparse tail observations and strong heterogeneity in extremal behavior across variables. To address this, we propose an iterative dimension clustering method grounded in extreme-value theory, which directly partitions variables—rather than adopting the conventional two-stage “estimate-then-cluster” approach—using an extremal index–driven iterative splitting strategy ordered from heavy- to light-tailed. This ensures theoretical coherence and estimation robustness. Simulation studies and empirical applications demonstrate that the method accurately identifies variable clusters sharing homogeneous extremal dependence structures even in finite samples, substantially improving integration efficiency of tail information from multiple sources and enhancing accuracy in multivariate extreme-value modeling. The framework yields a novel, interpretable, and scalable paradigm for high-dimensional extremal analysis.

Technology Category

Application Category

📝 Abstract
One potential solution to combat the scarcity of tail observations in extreme value analysis is to integrate information from multiple datasets sharing similar tail properties, for instance, a common extreme value index. In other words, for a multivariate dataset, we intend to group dimensions into clusters first, before applying any pooling techniques. This paper addresses the clustering problem for a high dimensional dataset, according to their extreme value indices. We propose an iterative clustering procedure that sequentially partitions the variables into groups, ordered from the heaviest-tailed to the lightesttailed distributions. At each step, our method identifies and extracts a group of variables that share the highest extreme value index among the remaining ones. This approach differs fundamentally from conventional clustering methods such as using pre-estimated extreme value indices in a two-step clustering method. We show the consistency property of the proposed algorithm and demonstrate its finite-sample performance using a simulation study and a real data application.
Problem

Research questions and friction points this paper is trying to address.

Clustering high-dimensional data by tail properties
Grouping dimensions with similar extreme value indices
Developing iterative method for heaviest to lightest tails
Innovation

Methods, ideas, or system contributions that make the work stand out.

Iterative clustering for tail distributions
Groups variables by extreme value indices
Sequential partitioning from heaviest to lightest tails