Unfolding the Past: A Comprehensive Deep Learning Approach to Analyzing Incunabula Pages

📅 2025-06-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Manual page structure analysis of incunabula (early printed books) is labor-intensive, inefficient, and error-prone. Method: This paper introduces the first end-to-end multimodal framework for historical book page analysis. We construct a dedicated dataset of 500 manually annotated pages, augmented with DocLayNet, and employ YOLO11n for fine-grained detection of text, headings, figures, tables, and handwritten regions (F1 = 0.94). OCR is performed using Tesseract—outperforming Kraken—and illustrative content is semantically described via a hybrid approach combining ResNet18 (98.7% image classification accuracy) and CLIP. Contribution/Results: This work unifies object detection, OCR, and cross-modal semantic understanding for incunabula analysis—the first such integration—demonstrating both the efficacy and scalability of deep learning in digital humanities research on early printed materials.

Technology Category

Application Category

📝 Abstract
We developed a proof-of-concept method for the automatic analysis of the structure and content of incunabula pages. A custom dataset comprising 500 annotated pages from five different incunabula was created using resources from the Jagiellonian Digital Library. Each page was manually labeled with five predefined classes: Text, Title, Picture, Table, and Handwriting. Additionally, the publicly available DocLayNet dataset was utilized as supplementary training data. To perform object detection, YOLO11n and YOLO11s models were employed and trained using two strategies: a combined dataset (DocLayNet and the custom dataset) and the custom dataset alone. The highest performance (F1 = 0.94) was achieved by the YOLO11n model trained exclusively on the custom data. Optical character recognition was then conducted on regions classified as Text, using both Tesseract and Kraken OCR, with Tesseract demonstrating superior results. Subsequently, image classification was applied to the Picture class using a ResNet18 model, achieving an accuracy of 98.7% across five subclasses: Decorative_letter, Illustration, Other, Stamp, and Wrong_detection. Furthermore, the CLIP model was utilized to generate semantic descriptions of illustrations. The results confirm the potential of machine learning in the analysis of early printed books, while emphasizing the need for further advancements in OCR performance and visual content interpretation.
Problem

Research questions and friction points this paper is trying to address.

Automatic analysis of incunabula pages' structure and content
Improving OCR performance for historical text recognition
Enhancing visual content interpretation in early printed books
Innovation

Methods, ideas, or system contributions that make the work stand out.

Custom dataset with annotated incunabula pages
YOLO11n model for object detection
ResNet18 for image classification
🔎 Similar Papers
No similar papers found.
K
Klaudia Ropel
Department of Human-Centered Artificial Intelligence, Institute of Applied Computer Science, Faculty of Physics, Astronomy and Applied Computer Science, Jagiellonian University, prof. Stanisława Łojasiewicza 11, 30-348 Kraków, Poland
Krzysztof Kutt
Krzysztof Kutt
Jagiellonian University
Knowledge GraphsSemantic WebArtificial IntelligenceDigital HumanitiesAffective Computing
L
Luiz do Valle Miranda
Department of Human-Centered Artificial Intelligence, Institute of Applied Computer Science, Faculty of Physics, Astronomy and Applied Computer Science, Jagiellonian University, prof. Stanisława Łojasiewicza 11, 30-348 Kraków, Poland
Grzegorz J. Nalepa
Grzegorz J. Nalepa
Jagiellonian University, Kraków, Poland
Artificial IntelligenceKnowledge EngineeringExplainable AIData MiningAffective Computing