🤖 AI Summary
This work addresses the limitation of classical correlation clustering (CC), which only supports binary similarity relations, by studying two generalized settings: colored correlation clustering (CCC) with semantic edge labels (i.e., multicolored edges) and pseudo-metric weighted CC satisfying the triangle inequality. We propose a unified algorithmic framework integrating linear programming (LP) relaxation, a customized pivoting strategy, and problem-specific rounding functions. Theoretically, we achieve the first tight 10/3-approximation ratio for pseudo-metric weighted CC; for CCC, we improve the approximation ratio from 2.5 to 2.15 and establish a tight lower bound of 2.11, approaching optimality. These results demonstrate both the practical efficacy of our framework in modeling complex similarity structures and its theoretical advancement over prior art.
📝 Abstract
Correlation Clustering (CC) is a foundational problem in unsupervised learning that models binary similarity relations using labeled graphs. While classical CC has been widely studied, many real-world applications involve more nuanced relationships, either multi-class categorical interactions or varying confidence levels in edge labels. To address these, two natural generalizations have been proposed: Chromatic Correlation Clustering (CCC), which assigns semantic colors to edge labels, and pseudometric-weighted CC, which allows edge weights satisfying the triangle inequality. In this paper, we develop improved approximation algorithms for both settings. Our approach leverages LP-based pivoting techniques combined with problem-specific rounding functions. For the pseudometric-weighted correlation clustering problem, we present a tight $10/3$-approximation algorithm, matching the best possible bound achievable within the framework of standard LP relaxation combined with specialized rounding. For the Chromatic Correlation Clustering (CCC) problem, we improve the approximation ratio from the previous best of $2.5$ to $2.15$, and we establish a lower bound of $2.11$ within the same analytical framework, highlighting the near-optimality of our result.