Clustering What Matters in Constrained Settings

📅 2023-04-29

🏛️ International Symposium on Algorithms and Computation

📈 Citations: 4

✨ Influential: 1

🤖 AI Summary

This paper studies the constrained clustering problem with outliers: given a metric space, compute a $k$-median or $k$-means solution satisfying hard capacity constraints while discarding at most $m$ outlier points. We propose the first generic reduction framework that systematically converts the outlier version into the outlier-free version with $(1+varepsilon)$-approximation guarantee. Our method combines instance enumeration, distance-space mapping, and parameterized algorithm design, and is applicable to arbitrary metric spaces. Specifically, it yields an FPT $(1+varepsilon)$-approximation in Euclidean space; FPT $(3+varepsilon)$- and $(9+varepsilon)$-approximations in general metric spaces; and, notably, the first $(2-delta)$-approximation for the Ulam metric—breaking the long-standing 2-approximation barrier. These results substantially broaden the theoretical boundaries and tractability scope of constrained clustering with outliers.

📝 Abstract

Constrained clustering problems generalize classical clustering formulations, e.g., $k$-median, $k$-means, by imposing additional constraints on the feasibility of clustering. There has been significant recent progress in obtaining approximation algorithms for these problems, both in the metric and the Euclidean settings. However, the outlier version of these problems, where the solution is allowed to leave out $m$ points from the clustering, is not well understood. In this work, we give a general framework for reducing the outlier version of a constrained $k$-median or $k$-means problem to the corresponding outlier-free version with only $(1+varepsilon)$-loss in the approximation ratio. The reduction is obtained by mapping the original instance of the problem to $f(k,m, varepsilon)$ instances of the outlier-free version, where $f(k, m, varepsilon) = left( frac{k+m}{varepsilon} ight)^{O(m)}$. As specific applications, we get the following results: - First FPT (in the parameters $k$ and $m$) $(1+varepsilon)$-approximation algorithm for the outlier version of capacitated $k$-median and $k$-means in Euclidean spaces with hard capacities. - First FPT (in the parameters $k$ and $m$) $(3+varepsilon)$ and $(9+varepsilon)$ approximation algorithms for the outlier version of capacitated $k$-median and $k$-means, respectively, in general metric spaces with hard capacities. - First FPT (in the parameters $k$ and $m$) $(2-delta)$-approximation algorithm for the outlier version of the $k$-median problem under the Ulam metric. Our work generalizes the known results to a larger class of constrained clustering problems. Further, our reduction works for arbitrary metric spaces and so can extend clustering algorithms for outlier-free versions in both Euclidean and arbitrary metric spaces.

Problem

Research questions and friction points this paper is trying to address.

Outlier handling in constrained k-median and k-means clustering.

General framework for reducing outlier problems with minimal loss.

First FPT approximation algorithms for capacitated outlier clustering.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reduces outlier problems to outlier-free versions

Uses FPT approximation for constrained clustering

Applies to Euclidean and arbitrary metric spaces

🔎 Similar Papers

No similar papers found.

Authors to Follow