🤖 AI Summary
Existing declarative mapping languages for knowledge graphs—such as RML—lack formal semantic foundations, leading to semantic ambiguity, implementation inconsistencies, unverifiable optimizations, and intractable expressiveness analysis.
Method: We propose the first language-agnostic mapping algebra framework that uniformly models declarative mappings from heterogeneous data sources to knowledge graphs. This framework provides rigorous, proof-ready semantics for RML and related languages. We further establish a complete set of algebraic rewrite rules and formally prove the translatability of RML into this algebra.
Contribution/Results: Our work enables correctness verification and automated optimization of mapping plans. It yields several sound equivalence-preserving optimization rules, advancing the theoretical foundations of mapping languages, facilitating robust tool implementation, and strengthening knowledge graph construction methodologies.
📝 Abstract
Although they exist since more than ten years already, have attracted diverse implementations, and have been used successfully in a significant number of applications, declarative mapping languages for constructing knowledge graphs from heterogeneous types of data sources still lack a solid formal foundation. This makes it impossible to introduce implementation and optimization techniques that are provably correct and, in fact, has led to discrepancies between different implementations. Moreover, it precludes studying fundamental properties of different languages (e.g., expressive power). To address this gap, this paper introduces a language-agnostic algebra for capturing mapping definitions. As further contributions, we show that the popular mapping language RML can be translated into our algebra (by which we also provide a formal definition of the semantics of RML) and we prove several algebraic rewriting rules that can be used to optimize mapping plans based on our algebra.