Wasserstein-based Kernels for Clustering: Application to Power Distribution Graphs

📅 2025-03-18

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

Addressing the challenge of jointly capturing distributional and vectorial characteristics in clustering non-vectorial data (e.g., graphs), this paper proposes a scalable framework for joint distribution–vector representation and similarity measurement. Methodologically, it introduces: (i) the first composable construction of Wasserstein distance kernels, enabling heterogeneous co-modeling of discrete probability distributions and Euclidean vectors; (ii) an efficient Wasserstein approximation algorithm based on multiple reference distributions; and (iii) a distance-agnostic clustering validity index. Experiments on power distribution network graph datasets—comprising 879 and 34,920 nodes—demonstrate that the proposed method significantly outperforms conventional graph kernels and spectral clustering in both clustering accuracy and computational efficiency.

Technology Category

Application Category

📝 Abstract

Many data clustering applications must handle objects that cannot be represented as vector data. In this context, the bag-of-vectors representation can be leveraged to describe complex objects through discrete distributions, and the Wasserstein distance can effectively measure the dissimilarity between them. Additionally, kernel methods can be used to embed data into feature spaces that are easier to analyze. Despite significant progress in data clustering, a method that simultaneously accounts for distributional and vectorial dissimilarity measures is still lacking. To tackle this gap, this work explores kernel methods and Wasserstein distance metrics to develop a computationally tractable clustering framework. The compositional properties of kernels allow the simultaneous handling of different metrics, enabling the integration of both vectors and discrete distributions for object representation. This approach is flexible enough to be applied in various domains, such as graph analysis and image processing. The framework consists of three main components. First, we efficiently approximate pairwise Wasserstein distances using multiple reference distributions. Second, we employ kernel functions based on Wasserstein distances and present ways of composing kernels to express different types of information. Finally, we use the kernels to cluster data and evaluate the quality of the results using scalable and distance-agnostic validity indices. A case study involving two datasets of 879 and 34,920 power distribution graphs demonstrates the framework's effectiveness and efficiency.

Problem

Research questions and friction points this paper is trying to address.

Handling non-vector data in clustering applications

Integrating Wasserstein distance and kernel methods

Clustering power distribution graphs effectively

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Wasserstein distance for clustering complex data

Integrates kernel methods with Wasserstein metrics

Applies scalable validity indices for clustering evaluation

🔎 Similar Papers

No similar papers found.