Clustering What Matters in Constrained Settings

๐Ÿ“… 2023-04-29
๐Ÿ›๏ธ International Symposium on Algorithms and Computation
๐Ÿ“ˆ Citations: 4
โœจ Influential: 1
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This paper studies the constrained clustering problem with outliers: given a metric space, compute a $k$-median or $k$-means solution satisfying hard capacity constraints while discarding at most $m$ outlier points. We propose the first generic reduction framework that systematically converts the outlier version into the outlier-free version with $(1+varepsilon)$-approximation guarantee. Our method combines instance enumeration, distance-space mapping, and parameterized algorithm design, and is applicable to arbitrary metric spaces. Specifically, it yields an FPT $(1+varepsilon)$-approximation in Euclidean space; FPT $(3+varepsilon)$- and $(9+varepsilon)$-approximations in general metric spaces; and, notably, the first $(2-delta)$-approximation for the Ulam metricโ€”breaking the long-standing 2-approximation barrier. These results substantially broaden the theoretical boundaries and tractability scope of constrained clustering with outliers.
๐Ÿ“ Abstract
Constrained clustering problems generalize classical clustering formulations, e.g., $k$-median, $k$-means, by imposing additional constraints on the feasibility of clustering. There has been significant recent progress in obtaining approximation algorithms for these problems, both in the metric and the Euclidean settings. However, the outlier version of these problems, where the solution is allowed to leave out $m$ points from the clustering, is not well understood. In this work, we give a general framework for reducing the outlier version of a constrained $k$-median or $k$-means problem to the corresponding outlier-free version with only $(1+varepsilon)$-loss in the approximation ratio. The reduction is obtained by mapping the original instance of the problem to $f(k,m, varepsilon)$ instances of the outlier-free version, where $f(k, m, varepsilon) = left( frac{k+m}{varepsilon} ight)^{O(m)}$. As specific applications, we get the following results: - First FPT (in the parameters $k$ and $m$) $(1+varepsilon)$-approximation algorithm for the outlier version of capacitated $k$-median and $k$-means in Euclidean spaces with hard capacities. - First FPT (in the parameters $k$ and $m$) $(3+varepsilon)$ and $(9+varepsilon)$ approximation algorithms for the outlier version of capacitated $k$-median and $k$-means, respectively, in general metric spaces with hard capacities. - First FPT (in the parameters $k$ and $m$) $(2-delta)$-approximation algorithm for the outlier version of the $k$-median problem under the Ulam metric. Our work generalizes the known results to a larger class of constrained clustering problems. Further, our reduction works for arbitrary metric spaces and so can extend clustering algorithms for outlier-free versions in both Euclidean and arbitrary metric spaces.
Problem

Research questions and friction points this paper is trying to address.

Outlier handling in constrained k-median and k-means clustering.
General framework for reducing outlier problems with minimal loss.
First FPT approximation algorithms for capacitated outlier clustering.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reduces outlier problems to outlier-free versions
Uses FPT approximation for constrained clustering
Applies to Euclidean and arbitrary metric spaces
๐Ÿ”Ž Similar Papers
No similar papers found.
Ragesh Jaiswal
Ragesh Jaiswal
Indian Institute of Technology Delhi
Theoretical Computer ScienceCryptographyMachine Learning
A
Amit Kumar
Department of Computer Science and Engineering, Indian Institute of Technology Delhi