DCSI - An improved measure of cluster separability based on separation and connectedness

📅 2023-10-19
🏛️ arXiv.org
📈 Citations: 4
Influential: 0
📄 PDF
🤖 AI Summary
Existing classification complexity measures and clustering validity indices (CVIs) inadequately balance inter-cluster separation and intra-cluster connectivity in density-based clustering, leading to insufficient assessment of whether classes constitute meaningful density clusters. To address this, we propose the Density-based Clustering Separation–Connectivity Index (DCSI), the first unified metric jointly modeling both properties: inter-cluster separation is quantified via density-reachability, while intra-cluster cohesion integrates path-based connectivity within clusters and local neighborhood geometry. DCSI effectively identifies touching or overlapping yet density-inseparable clusters—resolving key limitations of conventional CVIs in evaluating density-based algorithms such as DBSCAN. Experiments demonstrate that DCSI achieves high correlation with the Adjusted Rand Index (ARI) on synthetic benchmarks and accurately detects overlapping structures unsuitable for hard density clustering on real-world data. The index combines theoretical rigor with practical utility for density clustering validation.
📝 Abstract
Whether class labels in a given data set correspond to meaningful clusters is crucial for the evaluation of clustering algorithms using real-world data sets. This property can be quantified by separability measures. The central aspects of separability for density-based clustering are between-class separation and within-class connectedness, and neither classification-based complexity measures nor cluster validity indices (CVIs) adequately incorporate them. A newly developed measure (density cluster separability index, DCSI) aims to quantify these two characteristics and can also be used as a CVI. Extensive experiments on synthetic data indicate that DCSI correlates strongly with the performance of DBSCAN measured via the adjusted Rand index (ARI) but lacks robustness when it comes to multi-class data sets with overlapping classes that are ill-suited for density-based hard clustering. Detailed evaluation on frequently used real-world data sets shows that DCSI can correctly identify touching or overlapping classes that do not correspond to meaningful density-based clusters.
Problem

Research questions and friction points this paper is trying to address.

Improves cluster separability measurement for density-based clustering
Quantifies between-class separation and within-class connectedness
Evaluates meaningfulness of clusters in real-world data sets
Innovation

Methods, ideas, or system contributions that make the work stand out.

DCSI measures cluster separability
Incorporates separation and connectedness
Evaluates density-based clustering performance
🔎 Similar Papers
No similar papers found.
J
Jana Gauss
Department of Statistics, Ludwig-Maximilians-Universität München, Munich, Germany; Munich Center for Machine Learning, Munich, Germany
Fabian Scheipl
Fabian Scheipl
LMU Munich / Munich Center for Machine Learning
StatisticsBayesian StatisticsFunctional Data AnalysisAdditive ModelsSurvival Analysis
M
Moritz Herrmann
Department of Statistics, Ludwig-Maximilians-Universität München, Munich, Germany; Munich Center for Machine Learning, Munich, Germany; Institute for Medical Information Processing, Biometry, and Epidemiology, Ludwig-Maximilians-Universität München, Munich, Germany