π€ AI Summary
Existing approaches to automatically constructing educational knowledge graphs (EduKGs) from PDF learning materials suffer from poor scalability, low accuracy, and unreliable knowledge representation. To address these challenges, this paper proposes an end-to-end automated framework: first, it precisely extracts slide-level knowledge units from PDFs to build a fine-grained knowledge graph; second, it generates a course-level EduKG via semantic alignment and hierarchical fusion. The framework holistically integrates information extraction, document-structure-aware parsing, context-enhanced joint entity-relation extraction, and multi-source knowledge fusionβeach component incorporating domain-specific optimizations. Evaluated on the CourseMapper platform, the method improves graph accuracy by 17.5% and processing efficiency by 10Γ, significantly enhancing the reliability, reusability, and contextual adaptability of educational knowledge representation.
π Abstract
The automatic construction of Educational Knowledge Graphs (EduKGs) is essential for domain knowledge modeling by extracting meaningful representations from learning materials. Despite growing interest, identifying a scalable and reliable approach for automatic EduKG generation remains a challenge. In an attempt to develop a unified and robust pipeline for automatic EduKG construction, in this study we propose a pipeline for automatic EduKG construction from PDF learning materials. The process begins with generating slide-level EduKGs from individual pages/slides, which are then merged to form a comprehensive EduKG representing the entire learning material. We evaluate the accuracy of the EduKG generated from the proposed pipeline in our MOOC platform, CourseMapper. The observed accuracy, while indicative of partial success, is relatively low particularly in the educational context, where the reliability of knowledge representations is critical for supporting meaningful learning. To address this, we introduce targeted optimizations across multiple pipeline components. The optimized pipeline achieves a 17.5% improvement in accuracy and a tenfold increase in processing efficiency. Our approach offers a holistic, scalable and end-to-end pipeline for automatic EduKG construction, adaptable to diverse educational contexts, and supports improved semantic representation of learning content.