π€ AI Summary
This work addresses the k-means and k-center clustering problems in a restricted setting where only a weakβstrong distance oracle is available. By adapting k-means++ to this model for the first time and integrating a ball-carving strategy, the proposed algorithm substantially reduces the number of strong oracle queries. For k-means, it achieves a constant-factor approximation using only $O(k^2 \log^2 n)$ strong queries, improving upon the previous bound of $O(k^2 \log^4 n \log^2 \log n)$. For k-center, it obtains a $6(1+\varepsilon)$-approximation, enhancing the prior $14(1+\varepsilon)$ guarantee. Experimental results demonstrate significant improvements in both query efficiency and clustering quality.
π Abstract
Bateni et al. has recently introduced the weak-strong distance oracle model to study clustering problems in settings with limited distance information. Given query access to the strong-oracle and weak-oracle in the weak-strong oracle model, the authors design approximation algorithms for $k$-means and $k$-center clustering problems. In this work, we design algorithms with improved guarantees for $k$-means and $k$-center clustering problems in the weak-strong oracle model. The $k$-means++ algorithm is routinely used to solve $k$-means in settings where complete distance information is available. One of the main contributions of this work is to show that $k$-means++ algorithm can be adapted to work in the weak-strong oracle model using only a small number of strong-oracle queries, which is the critical resource in this model. In particular, our $k$-means++ based algorithm gives a constant approximation for $k$-means and uses $O(k^2 \log^2{n})$ strong-oracle queries. This improves on the algorithm of Bateni et al. that uses $O(k^2 \log^4n \log^2 \log n)$ strong-oracle queries for a constant factor approximation of $k$-means. For the $k$-center problem, we give a simple ball-carving based $6(1 + \epsilon)$-approximation algorithm that uses $O(k^3 \log^2{n} \log{\frac{\log{n}}{\epsilon}})$ strong-oracle queries. This is an improvement over the $14(1 + \epsilon)$-approximation algorithm of Bateni et al. that uses $O(k^2 \log^4{n} \log^2{\frac{\log{n}}{\epsilon}})$ strong-oracle queries. To show the effectiveness of our algorithms, we perform empirical evaluations on real-world datasets and show that our algorithms significantly outperform the algorithms of Bateni et al.