🤖 AI Summary
Existing clustering validity metrics—designed for Euclidean spaces—fail to reliably assess correlation-based clustering of multivariate time series, as correlation structures are continuous, reference-free, and lack a natural metric space.
Method: We propose the first standardized clustering validity framework specifically tailored to correlation patterns. Its core innovation is defining interpretable, comparable reference structures by discretizing the infinite correlation space using canonical correlation patterns as mathematically grounded ground-truth targets. We further adapt the silhouette coefficient and Davies–Bouldin index by introducing L1-norm-based mapping and L5-norm-based dissimilarity measures to enhance sensitivity to subtle differences in correlation structure.
Results: Extensive evaluation on synthetic data demonstrates that our framework robustly detects correlation structure degradation and significantly outperforms conventional validity indices. It provides a reliable, interpretable, and domain-agnostic assessment tool for correlation clustering—particularly critical in high-stakes applications such as finance and healthcare.
📝 Abstract
Clustering of multivariate time series using correlation-based methods reveals regime changes in relationships between variables across health, finance, and industrial applications. However, validating whether discovered clusters represent distinct relationships rather than arbitrary groupings remains a fundamental challenge. Existing clustering validity indices were developed for Euclidean data, and their effectiveness for correlation patterns has not been systematically evaluated. Unlike Euclidean clustering, where geometric shapes provide discrete reference targets, correlations exist in continuous space without equivalent reference patterns. We address this validation gap by introducing canonical correlation patterns as mathematically defined validation targets that discretise the infinite correlation space into finite, interpretable reference patterns. Using synthetic datasets with perfect ground truth across controlled conditions, we demonstrate that canonical patterns provide reliable validation targets, with L1 norm for mapping and L5 norm for silhouette width criterion and Davies-Bouldin index showing superior performance. These methods are robust to distribution shifts and appropriately detect correlation structure degradation, enabling practical implementation guidelines. This work establishes a methodological foundation for rigorous correlation-based clustering validation in high-stakes domains.