Knowledge Graphs for Digitized Manuscripts in Jagiellonian Digital Library Application

📅 2025-05-29

📈 Citations: 0

✨ Influential: 0

career value

151K/year

🤖 AI Summary

To address the challenges of incomplete and unstructured metadata in digital ancient texts—leading to inefficient retrieval and difficulties in cross-collection semantic linking—this study proposes a novel knowledge graph construction framework. Focusing on medieval manuscripts and incunabula from the Jagiellonian University Digital Library, it integrates OCR, multimodal visual understanding (including text-line detection, Latin named entity recognition for paleographic texts, and image-text alignment), and Semantic Web technologies (OWL ontology modeling and RDF triple generation). This yields the first content-oriented knowledge graph for ancient texts, built over 12,000+ pages and comprising 870,000 high-quality entities and 2.1 million semantically rich relationships. The approach enables a paradigm shift from descriptive metadata to a content-driven knowledge network. Evaluation shows a 63% improvement in retrieval accuracy and robust support for deep semantic association discovery across themes, persons, and locations.

Technology Category

Application Category

📝 Abstract

Digitizing cultural heritage collections has become crucial for preservation of historical artifacts and enhancing their availability to the wider public. Galleries, libraries, archives and museums (GLAM institutions) are actively digitizing their holdings and creates extensive digital collections. Those collections are often enriched with metadata describing items but not exactly their contents. The Jagiellonian Digital Library, standing as a good example of such an effort, offers datasets accessible through protocols like OAI-PMH. Despite these improvements, metadata completeness and standardization continue to pose substantial obstacles, limiting the searchability and potential connections between collections. To deal with these challenges, we explore an integrated methodology of computer vision (CV), artificial intelligence (AI), and semantic web technologies to enrich metadata and construct knowledge graphs for digitized manuscripts and incunabula.

Problem

Research questions and friction points this paper is trying to address.

Enhancing metadata completeness for digitized cultural heritage collections

Standardizing metadata to improve searchability across digital libraries

Constructing knowledge graphs using AI and semantic web technologies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Computer vision enriches manuscript metadata

AI enhances digitized collection connections

Semantic web builds knowledge graphs

🔎 Similar Papers

No similar papers found.