AutoMathKG: The automated mathematical knowledge graph based on LLM and vector database

📅 2025-05-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing mathematical knowledge graph (KG) construction faces two key bottlenecks: (1) incomplete corpora requiring labor-intensive manual curation, and (2) difficulty in automatically fusing heterogeneous, multi-source mathematical knowledge. Method: We propose an end-to-end automated framework that models definitions, theorems, and problems as directed graph nodes; integrates SBERT-based dual-strategy embedding, in-context learning for data augmentation, and LLM-driven knowledge extraction, completion, and fusion decisions. We introduce MathVD—a mathematics-specific vector database—and a collaborative update mechanism with a Math-LLM to support missing proof completion and cross-source entity alignment. Results: Evaluated on ProofWiki, arXiv, textbooks, and TheoremQA, MathVD achieves significant gains in reachability queries over five baselines; the Math-LLM demonstrates strong mathematical reasoning. Our system enables high-quality, broad-coverage, multi-dimensional, and dynamically evolvable mathematical KG construction.

Technology Category

Application Category

📝 Abstract
A mathematical knowledge graph (KG) presents knowledge within the field of mathematics in a structured manner. Constructing a math KG using natural language is an essential but challenging task. There are two major limitations of existing works: first, they are constrained by corpus completeness, often discarding or manually supplementing incomplete knowledge; second, they typically fail to fully automate the integration of diverse knowledge sources. This paper proposes AutoMathKG, a high-quality, wide-coverage, and multi-dimensional math KG capable of automatic updates. AutoMathKG regards mathematics as a vast directed graph composed of Definition, Theorem, and Problem entities, with their reference relationships as edges. It integrates knowledge from ProofWiki, textbooks, arXiv papers, and TheoremQA, enhancing entities and relationships with large language models (LLMs) via in-context learning for data augmentation. To search for similar entities, MathVD, a vector database, is built through two designed embedding strategies using SBERT. To automatically update, two mechanisms are proposed. For knowledge completion mechanism, Math LLM is developed to interact with AutoMathKG, providing missing proofs or solutions. For knowledge fusion mechanism, MathVD is used to retrieve similar entities, and LLM is used to determine whether to merge with a candidate or add as a new entity. A wide range of experiments demonstrate the advanced performance and broad applicability of the AutoMathKG system, including superior reachability query results in MathVD compared to five baselines and robust mathematical reasoning capability in Math LLM.
Problem

Research questions and friction points this paper is trying to address.

Automates construction of math knowledge graphs from diverse sources
Overcomes limitations of incomplete corpus and manual supplementation
Enables automatic updates and integration of mathematical entities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses LLMs for data augmentation via in-context learning
Builds MathVD vector database with SBERT embeddings
Automates updates via knowledge completion and fusion
🔎 Similar Papers
No similar papers found.
R
Rong Bian
Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, China; School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
Y
Yu Geng
Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, China; School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
Z
Zijian Yang
Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, China; School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
Bing Cheng
Bing Cheng
The Chinese Academy of Science
machine learningartificial intelligencefinanceeconomics