🤖 AI Summary
To address the high computational cost of gradient computation, excessive memory consumption, and poor scalability of translational knowledge graph (KG) embedding training on large-scale datasets, this paper proposes the first general-purpose, efficient training framework based on sparse matrix multiplication (SpMM). The method unifies multiple scatter/gather operations in embedding updates into sparse-dense matrix multiplications and introduces a customized sparse embedding update strategy alongside CPU/GPU co-optimization. It is the first work to systematically leverage SpMM kernels for cross-model acceleration—naturally extending beyond translational models to non-translational ones. Our implementation achieves up to 5.3× and 4.2× training speedup on CPU and GPU, respectively, while significantly reducing GPU memory footprint. The framework maintains consistent performance across both small- and large-scale KG benchmarks.
📝 Abstract
Knowledge graph (KG) learning offers a powerful framework for generating new knowledge and making inferences. Training KG embedding can take a significantly long time, especially for larger datasets. Our analysis shows that the gradient computation of embedding is one of the dominant functions in the translation-based KG embedding training loop. We address this issue by replacing the core embedding computation with SpMM (Sparse-Dense Matrix Multiplication) kernels. This allows us to unify multiple scatter (and gather) operations as a single operation, reducing training time and memory usage. We create a general framework for training KG models using sparse kernels and implement four models, namely TransE, TransR, TransH, and TorusE. Our sparse implementations exhibit up to 5.3x speedup on the CPU and up to 4.2x speedup on the GPU with a significantly low GPU memory footprint. The speedups are consistent across large and small datasets for a given model. Our proposed sparse approach can also be extended to accelerate other translation-based (such as TransC, TransM, etc.) and non-translational (such as DistMult, ComplEx, RotatE, etc.) models as well.