π€ AI Summary
This work addresses the challenge of balancing constraint satisfaction and computational efficiency in pairwise-constrained clustering, particularly in large-scale and quantum/quantum-hybrid settings. To this end, the authors propose the PASS framework, which introduces a novel ambiguity-guided subset selection mechanism. By compressing must-link constraints into pseudopoints, performing constraint-aware edge sampling, and leveraging FisherβRao distance-based information-geometric scoring, PASS efficiently identifies high-information quantum subsets under limited budgets. Experimental results demonstrate that PASS achieves silhouette sum of errors (SSE) performance comparable to state-of-the-art methods at significantly lower computational cost across multiple benchmark datasets. Moreover, it remains robust and effective even in scenarios where conventional approaches fail, exhibiting strong scalability and high satisfaction rates for both must-link and cannot-link constraints.
π Abstract
Pairwise-constrained clustering augments unsupervised partitioning with side information by enforcing must-link (ML) and cannot-link (CL) constraints between specific samples, yielding labelings that respect known affinities and separations. However, ML and CL constraints add an extra layer of complexity to the clustering problem, with current methods struggling in data scalability, especially in niche applications like quantum or quantum-hybrid clustering. We propose PASS, a pairwise-constraints and ambiguity-driven subset selection framework that preserves ML and CL constraints satisfaction while allowing scalable, high-quality clustering solution. PASS collapses ML constraints into pseudo-points and offers two selectors: a constraint-aware margin rule that collects near-boundary points and all detected CL violations, and an information-geometric rule that scores points via a Fisher-Rao distance derived from soft assignment posteriors, then selects the highest-information subset under a simple budget. Across diverse benchmarks, PASS attains competitive SSE at substantially lower cost than exact or penalty-based methods, and remains effective in regimes where prior approaches fail.