Khatri-Rao Clustering for Data Summarization

📅 2026-02-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of balancing conciseness and accuracy in centroid-based clustering when the number of clusters is large. To this end, we propose a novel Khatri-Rao clustering paradigm that, for the first time, incorporates the Khatri-Rao structure into clustering by assuming cluster centroids are generated through tensor-product interactions among multiple compact prototype sets. This formulation yields a more compact yet accurate data summary. We develop both a Khatri-Rao k-Means algorithm and a deep clustering framework that jointly leverage representation learning and tensor decomposition principles. Extensive experiments demonstrate that our approach significantly reduces summary size while maintaining or even improving clustering accuracy, outperforming standard k-Means and existing deep clustering methods.

Technology Category

Application Category

📝 Abstract
As datasets continue to grow in size and complexity, finding succinct yet accurate data summaries poses a key challenge. Centroid-based clustering, a widely adopted approach to address this challenge, finds informative summaries of datasets in terms of few prototypes, each representing a cluster in the data. Despite their wide adoption, the resulting data summaries often contain redundancies, limiting their effectiveness particularly in datasets characterized by a large number of underlying clusters. To overcome this limitation, we introduce the Khatri-Rao clustering paradigm that extends traditional centroid-based clustering to produce more succinct but equally accurate data summaries by postulating that centroids arise from the interaction of two or more succinct sets of protocentroids. We study two central approaches to centroid-based clustering, namely the well-established k-Means algorithm and the increasingly popular topic of deep clustering, under the lens of the Khatri-Rao paradigm. To this end, we introduce the Khatri-Rao k-Means algorithm and the Khatri-Rao deep clustering framework. Extensive experiments show that Khatri-Rao k-Means can strike a more favorable trade-off between succinctness and accuracy in data summarization than standard k-Means. Leveraging representation learning, the Khatri-Rao deep clustering framework offers even greater benefits, reducing even more the size of data summaries given by deep clustering while preserving their accuracy.
Problem

Research questions and friction points this paper is trying to address.

data summarization
centroid-based clustering
redundancy
Khatri-Rao clustering
succinctness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Khatri-Rao clustering
data summarization
protocentroids
k-Means
deep clustering
🔎 Similar Papers
No similar papers found.