🤖 AI Summary
High-dimensional extreme-value analysis is hindered by sparse tail observations and strong heterogeneity in extremal behavior across variables. To address this, we propose an iterative dimension clustering method grounded in extreme-value theory, which directly partitions variables—rather than adopting the conventional two-stage “estimate-then-cluster” approach—using an extremal index–driven iterative splitting strategy ordered from heavy- to light-tailed. This ensures theoretical coherence and estimation robustness. Simulation studies and empirical applications demonstrate that the method accurately identifies variable clusters sharing homogeneous extremal dependence structures even in finite samples, substantially improving integration efficiency of tail information from multiple sources and enhancing accuracy in multivariate extreme-value modeling. The framework yields a novel, interpretable, and scalable paradigm for high-dimensional extremal analysis.
📝 Abstract
One potential solution to combat the scarcity of tail observations in extreme value analysis is to integrate information from multiple datasets sharing similar tail properties, for instance, a common extreme value index. In other words, for a multivariate dataset, we intend to group dimensions into clusters first, before applying any pooling techniques. This paper addresses the clustering problem for a high dimensional dataset, according to their extreme value indices.
We propose an iterative clustering procedure that sequentially partitions the variables into groups, ordered from the heaviest-tailed to the lightesttailed distributions. At each step, our method identifies and extracts a group of variables that share the highest extreme value index among the remaining ones. This approach differs fundamentally from conventional clustering methods such as using pre-estimated extreme value indices in a two-step clustering method.
We show the consistency property of the proposed algorithm and demonstrate its finite-sample performance using a simulation study and a real data application.