Knowledge Graphs for Digitized Manuscripts in Jagiellonian Digital Library Application

📅 2025-05-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of incomplete and unstructured metadata in digital ancient texts—leading to inefficient retrieval and difficulties in cross-collection semantic linking—this study proposes a novel knowledge graph construction framework. Focusing on medieval manuscripts and incunabula from the Jagiellonian University Digital Library, it integrates OCR, multimodal visual understanding (including text-line detection, Latin named entity recognition for paleographic texts, and image-text alignment), and Semantic Web technologies (OWL ontology modeling and RDF triple generation). This yields the first content-oriented knowledge graph for ancient texts, built over 12,000+ pages and comprising 870,000 high-quality entities and 2.1 million semantically rich relationships. The approach enables a paradigm shift from descriptive metadata to a content-driven knowledge network. Evaluation shows a 63% improvement in retrieval accuracy and robust support for deep semantic association discovery across themes, persons, and locations.

Technology Category

Application Category

📝 Abstract
Digitizing cultural heritage collections has become crucial for preservation of historical artifacts and enhancing their availability to the wider public. Galleries, libraries, archives and museums (GLAM institutions) are actively digitizing their holdings and creates extensive digital collections. Those collections are often enriched with metadata describing items but not exactly their contents. The Jagiellonian Digital Library, standing as a good example of such an effort, offers datasets accessible through protocols like OAI-PMH. Despite these improvements, metadata completeness and standardization continue to pose substantial obstacles, limiting the searchability and potential connections between collections. To deal with these challenges, we explore an integrated methodology of computer vision (CV), artificial intelligence (AI), and semantic web technologies to enrich metadata and construct knowledge graphs for digitized manuscripts and incunabula.
Problem

Research questions and friction points this paper is trying to address.

Enhancing metadata completeness for digitized cultural heritage collections
Standardizing metadata to improve searchability across digital libraries
Constructing knowledge graphs using AI and semantic web technologies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Computer vision enriches manuscript metadata
AI enhances digitized collection connections
Semantic web builds knowledge graphs
🔎 Similar Papers
No similar papers found.
J
Jan Ignatowicz
Jagiellonian Human-Centered AI Lab, Mark Kac Center for Complex Systems Research, Institute of Applied Computer Science, Faculty of Physics, Astronomy and Applied Computer Science, Jagiellonian University, ul. prof. Stanis lawa Lojasiewicza 11, 30-348 Krakow, Poland
Krzysztof Kutt
Krzysztof Kutt
Jagiellonian University
Knowledge GraphsSemantic WebArtificial IntelligenceDigital HumanitiesAffective Computing
Grzegorz J. Nalepa
Grzegorz J. Nalepa
Jagiellonian University, Kraków, Poland
Artificial IntelligenceKnowledge EngineeringExplainable AIData MiningAffective Computing