Dimension-Free Parameterized Approximation Schemes for Hybrid Clustering

📅 2025-01-07

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This paper studies the Hybrid $k$-Clustering problem: select $k$ centers from $n$ points to minimize the sum of $r$-distances—defined as $max{d(p,q)-r,0}$—from each point to its nearest center, thereby unifying $k$-Center and $k$-Median. We propose the first dimension-independent FPT approximation algorithm, eliminating exponential dependence on ambient dimension $d$. Our approach extends to generalized metric spaces, including doubling metrics, minor-free graphs, and bounded-treewidth graphs. We construct the first coreset for Hybrid Clustering in doubling spaces. Leveraging parameterized design, metric embedding theory, coreset construction, and local search, we achieve a $(1+varepsilon,1+varepsilon)$-bicriteria approximation—i.e., violating both $k$ and $r$ by at most a factor of $1+varepsilon$—with running time FPT in $k$ and $1/varepsilon$. The algorithm further generalizes to the $z$-power objective $sum_p max{d(p,q)-r,0}^z$, supporting broader robust clustering formulations.

Technology Category

Application Category

📝 Abstract

Hybrid $k$-Clustering is a model of clustering that generalizes two of the most widely studied clustering objectives: $k$-Center and $k$-Median. In this model, given a set of $n$ points $P$, the goal is to find $k$ centers such that the sum of the $r$-distances of each point to its nearest center is minimized. The $r$-distance between two points $p$ and $q$ is defined as $max{d(p, q)-r, 0}$ -- this represents the distance of $p$ to the boundary of the $r$-radius ball around $q$ if $p$ is outside the ball, and $0$ otherwise. This problem was recently introduced by Fomin et al. [APPROX 2024], who designed a $(1+varepsilon, 1+varepsilon)$-bicrtieria approximation that runs in time $2^{(kd/varepsilon)^{O(1)}} cdot n^{O(1)}$ for inputs in $mathbb{R}^d$; such a bicriteria solution uses balls of radius $(1+varepsilon)r$ instead of $r$, and has a cost at most $1+varepsilon$ times the cost of an optimal solution using balls of radius $r$. In this paper we significantly improve upon this result by designing an approximation algorithm with the same bicriteria guarantee, but with running time that is FPT only in $k$ and $varepsilon$ -- crucially, removing the exponential dependence on the dimension $d$. This resolves an open question posed in their paper. Our results extend further in several directions. First, our approximation scheme works in a broader class of metric spaces, including doubling spaces, minor-free, and bounded treewidth metrics. Secondly, our techniques yield a similar bicriteria FPT-approximation schemes for other variants of Hybrid $k$-Clustering, e.g., when the objective features the sum of $z$-th power of the $r$-distances. Finally, we also design a coreset for Hybrid $k$-Clustering in doubling spaces, answering another open question from the work of Fomin et al.

Problem

Research questions and friction points this paper is trying to address.

High-dimensional Data Clustering

Efficient Algorithms

Approximation Methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

k-clustering

dimension-independent runtime

coreset construction in doubling metrics

🔎 Similar Papers

No similar papers found.

Authors to Follow