🤖 AI Summary
This work addresses the challenges of redundancy, inconsistency, and anomalies in knowledge graphs stemming from the absence of strict schemas. While existing normalization approaches inadequately model functional dependencies involving edges and node-edge combinations, this study pioneers the extension of functional dependencies to graph-structured data by introducing a graph-native normalization framework. The authors formally define graph object functional dependencies that encompass nodes, edges, and their compositions, establish multiple graph-native normal forms, and devise corresponding graph transformation algorithms. Experimental evaluations on both synthetic and real-world knowledge graph datasets demonstrate that the proposed method significantly enhances graph consistency and overall quality, thereby establishing the first comprehensive theoretical foundation for graph-native normalization.
📝 Abstract
In recent years, knowledge graphs (KGs) - in particular in the form of labeled property graphs (LPGs) - have become essential components in a broad range of applications. Although the absence of strict schemas for KGs facilitates structural issues that lead to redundancies and subsequently to inconsistencies and anomalies, the problem of KG quality has so far received only little attention. Inspired by normalization using functional dependencies for relational data, a first approach exploiting dependencies within nodes has been proposed. However, real-world KGs also expose functional dependencies involving edges. In this paper, we therefore propose graph-native normalization, which considers dependencies within nodes, edges, and their combination. We define a range of graph-native normal forms and graph object functional dependencies and propose algorithms for transforming graphs accordingly. We evaluate our contributions using a broad range of synthetic and native graph datasets.