🤖 AI Summary
Density-based clustering (e.g., DBSCAN) outperforms centroid-based methods in handling noise and arbitrarily shaped clusters, yet its neighborhood radius parameter is prohibitively expensive to tune in large-scale, high-dimensional settings. This work provides the first theoretical proof and empirical validation that the number of clusters exhibits approximate unimodality with respect to the radius parameter. Leveraging this property, we propose the first general-purpose framework for automatic parameter tuning via ternary search—integrated with density-reachability analysis and high-dimensional feature embeddings (for NLP, speech, and computer vision). Our approach significantly improves search efficiency without compromising clustering quality. Evaluated on multimodal, large-scale datasets, it achieves up to 8.2× faster hyperparameter optimization while matching or surpassing manually tuned performance.
📝 Abstract
Density-based clustering methods often surpass centroid-based counterparts, when addressing data with noise or arbitrary data distributions common in real-world problems. In this study, we reveal a key property intrinsic to density-based clustering methods regarding the relation between the number of clusters and the neighborhood radius of core points - we empirically show that it is nearly unimodal, and support this claim theoretically in a specific setting. We leverage this property to devise new strategies for finding appropriate values for the radius more efficiently based on the Ternary Search algorithm. This is especially important for large scale data that is high-dimensional, where parameter tuning is computationally intensive. We validate our methodology through extensive applications across a range of high-dimensional, large-scale NLP, Audio, and Computer Vision tasks, demonstrating its practical effectiveness and robustness. This work not only offers a significant advancement in parameter control for density-based clustering but also broadens the understanding regarding the relations between their guiding parameters. Our code is available at https://github.com/oronnir/UnimodalStrategies.