🤖 AI Summary
This work addresses the challenge of natively representing and efficiently querying tensor data—arising from the integration of knowledge graphs (KGs) and machine learning (ML)—within RDF-based systems. Methodologically, it (1) introduces a lightweight RDF tensor literal syntax and serialization format; (2) defines 36 tensor-specific SPARQL functions and four classes of tensor-aware aggregation operations; and (3) implements an open-source SPARQL engine on Apache Jena, incorporating RDF Schema extensions, SPARQL 1.2 syntax enhancements, and optimized tensor indexing. Experimental results demonstrate substantial improvements in both query efficiency and expressive power for joint KG–embedding-space queries. The contribution includes a publicly available benchmark suite, exemplar tensor-augmented knowledge graphs, and a comprehensive validation framework. To the best of our knowledge, this is the first production-ready, tensor-aware knowledge graph infrastructure designed specifically to support ML-KG hybrid applications.
📝 Abstract
Embedding tensors in databases has recently gained in significance, due to the rapid proliferation of machine learning methods (including LLMs) which produce embeddings in the form of tensors. To support emerging use cases hybridizing machine learning with knowledge graphs, a robust and efficient tensor representation scheme is needed. We introduce a novel approach for representing data tensors as literals in RDF, along with an extension of SPARQL implementing specialized functionalities for handling such literals. The extension includes 36 SPARQL functions and four aggregates. To support this approach, we provide a thoroughly tested, open-source implementation based on Apache Jena, along with an exemplary knowledge graph and query set.