🤖 AI Summary
On NISQ devices, quantum k-means clustering suffers from high data-loading overhead and a quantum-bit count that scales with sample size—severely limiting practical deployment.
Method: We propose a hybrid quantum k-means algorithm featuring (i) constant-dimensional unbiased Fourier sketching for data compression and efficient quantum state encoding—capping peak qubit usage at ≤9 regardless of dataset size; and (ii) an elite-preservation strategy combined with shallow-depth QAOA to solve surrogate QUBO subproblems within the compressed learning framework.
Contribution/Results: Experiments on Qiskit Aer simulators and nine real-world datasets demonstrate that our method achieves reconstruction error comparable to classical baselines—while consuming only ≤9 qubits—and maintains robust clustering accuracy under realistic noise conditions. This establishes the first k-means variant achieving constant-qubit scaling without sacrificing fidelity or generalizability.
📝 Abstract
Clustering on NISQ hardware is constrained by data loading and limited qubits. We present extbf{qc-kmeans}, a hybrid compressive $k$-means that summarizes a dataset with a constant-size Fourier-feature sketch and selects centroids by solving small per-group QUBOs with shallow QAOA circuits. The QFF sketch estimator is unbiased with mean-squared error $O(varepsilon^2)$ for $B,S=Θ(varepsilon^{-2})$, and the peak-qubit requirement $q_{ ext{peak}}=max{D,lceil log_2 B
ceil + 1}$ does not scale with the number of samples. A refinement step with elitist retention ensures non-increasing surrogate cost. In Qiskit Aer simulations (depth $p{=}1$), the method ran with $le 9$ qubits on low-dimensional synthetic benchmarks and achieved competitive sum-of-squared errors relative to quantum baselines; runtimes are not directly comparable. On nine real datasets (up to $4.3 imes 10^5$ points), the pipeline maintained constant peak-qubit usage in simulation. Under IBM noise models, accuracy was similar to the idealized setting. Overall, qc-kmeans offers a NISQ-oriented formulation with shallow, bounded-width circuits and competitive clustering quality in simulation.