Improved Algorithms for Clustering with Noisy Distance Oracles

πŸ“… 2026-02-20
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the k-means and k-center clustering problems in a restricted setting where only a weak–strong distance oracle is available. By adapting k-means++ to this model for the first time and integrating a ball-carving strategy, the proposed algorithm substantially reduces the number of strong oracle queries. For k-means, it achieves a constant-factor approximation using only $O(k^2 \log^2 n)$ strong queries, improving upon the previous bound of $O(k^2 \log^4 n \log^2 \log n)$. For k-center, it obtains a $6(1+\varepsilon)$-approximation, enhancing the prior $14(1+\varepsilon)$ guarantee. Experimental results demonstrate significant improvements in both query efficiency and clustering quality.

Technology Category

Application Category

πŸ“ Abstract
Bateni et al. has recently introduced the weak-strong distance oracle model to study clustering problems in settings with limited distance information. Given query access to the strong-oracle and weak-oracle in the weak-strong oracle model, the authors design approximation algorithms for $k$-means and $k$-center clustering problems. In this work, we design algorithms with improved guarantees for $k$-means and $k$-center clustering problems in the weak-strong oracle model. The $k$-means++ algorithm is routinely used to solve $k$-means in settings where complete distance information is available. One of the main contributions of this work is to show that $k$-means++ algorithm can be adapted to work in the weak-strong oracle model using only a small number of strong-oracle queries, which is the critical resource in this model. In particular, our $k$-means++ based algorithm gives a constant approximation for $k$-means and uses $O(k^2 \log^2{n})$ strong-oracle queries. This improves on the algorithm of Bateni et al. that uses $O(k^2 \log^4n \log^2 \log n)$ strong-oracle queries for a constant factor approximation of $k$-means. For the $k$-center problem, we give a simple ball-carving based $6(1 + \epsilon)$-approximation algorithm that uses $O(k^3 \log^2{n} \log{\frac{\log{n}}{\epsilon}})$ strong-oracle queries. This is an improvement over the $14(1 + \epsilon)$-approximation algorithm of Bateni et al. that uses $O(k^2 \log^4{n} \log^2{\frac{\log{n}}{\epsilon}})$ strong-oracle queries. To show the effectiveness of our algorithms, we perform empirical evaluations on real-world datasets and show that our algorithms significantly outperform the algorithms of Bateni et al.
Problem

Research questions and friction points this paper is trying to address.

clustering
noisy distance oracles
k-means
k-center
weak-strong oracle model
Innovation

Methods, ideas, or system contributions that make the work stand out.

weak-strong oracle
k-means++ adaptation
query complexity
clustering approximation
ball-carving algorithm
πŸ”Ž Similar Papers
2024-09-01arXiv.orgCitations: 4
2023-03-10Journal of ClassificationCitations: 10