CSTS: A Benchmark for the Discovery of Correlation Structures in Time Series Clustering

📅 2025-05-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Time-series clustering lacks objective quality assessment mechanisms and suffers from opaque failure attribution. Method: We propose a structure-driven evaluation paradigm and introduce CSTS—the first synthetic benchmark for multivariate time-series correlation-structure discovery—featuring multi-factor controllable perturbations (e.g., downsampling, distribution shift, sparsification) and robustness validation to address unreliable evaluation under ground-truth absence. Contribution/Results: We achieve the first attributable diagnosis of clustering failures by disentangling degradation in data structure, algorithmic limitations, and misaligned evaluation criteria. Our scalable generation framework and standardized evaluation protocol enable reproducible benchmarking. Empirical analysis reveals previously unknown implicit sensitivity of a widely used clustering algorithm to non-Gaussian distributions—demonstrating how our paradigm advances time-series clustering from empirical practice toward a rigorous, diagnosable, and scientifically grounded evaluation framework.

Technology Category

Application Category

📝 Abstract
Time series clustering promises to uncover hidden structural patterns in data with applications across healthcare, finance, industrial systems, and other critical domains. However, without validated ground truth information, researchers cannot objectively assess clustering quality or determine whether poor results stem from absent structures in the data, algorithmic limitations, or inappropriate validation methods, raising the question whether clustering is"more art than science"(Guyon et al., 2009). To address these challenges, we introduce CSTS (Correlation Structures in Time Series), a synthetic benchmark for evaluating the discovery of correlation structures in multivariate time series data. CSTS provides a clean benchmark that enables researchers to isolate and identify specific causes of clustering failures by differentiating between correlation structure deterioration and limitations of clustering algorithms and validation methods. Our contributions are: (1) a comprehensive benchmark for correlation structure discovery with distinct correlation structures, systematically varied data conditions, established performance thresholds, and recommended evaluation protocols; (2) empirical validation of correlation structure preservation showing moderate distortion from downsampling and minimal effects from distribution shifts and sparsification; and (3) an extensible data generation framework enabling structure-first clustering evaluation. A case study demonstrates CSTS's practical utility by identifying an algorithm's previously undocumented sensitivity to non-normal distributions, illustrating how the benchmark enables precise diagnosis of methodological limitations. CSTS advances rigorous evaluation standards for correlation-based time series clustering.
Problem

Research questions and friction points this paper is trying to address.

Assessing clustering quality without ground truth in time series
Differentiating causes of clustering failures in correlation structures
Providing synthetic benchmark for rigorous clustering evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Synthetic benchmark for correlation structure evaluation
Differentiates between structure deterioration and algorithm limitations
Extensible framework for structure-first clustering assessment
🔎 Similar Papers
No similar papers found.
Isabella Degen
Isabella Degen
EPSRC Doctoral Impact Fellow, University of Bristol
AI validationmachine learningunsupervised learningtime seriestype 1 diabetes
Z
Z. Abdallah
School of Engineering Mathematics and Technology, University of Bristol
Henry W J Reeve
Henry W J Reeve
University of Bristol
Statistics & Machine Learning
K
Kate Robson Brown
College of Engineering and Architecture, University College Dublin