Moving Pictures of Thought: Extracting Visual Knowledge in Charles S. Peirce's Manuscripts with Vision-Language Models

📅 2025-11-17

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

This study addresses the challenge of effectively parsing mixed-text-and-image pages in Peirce’s manuscripts—a task poorly supported by existing digital humanities tools. Methodologically, it proposes a cross-modal analytical framework integrating vision-language models (VLMs) with Peirce’s semiotics: (1) document layout segmentation extracts image–text fragments; (2) spatial image–text associations are established via IIIF standards, and semiotically grounded prompts guide VLMs to generate interpretable diagrammatic semantic descriptions; (3) structured diagram annotations are embedded into a knowledge graph. Its key contribution lies in the first systematic integration of VLMs with Peirce’s semiotic framework, enabling automated, explainable extraction of diagrammatic knowledge from manuscripts. Experiments demonstrate significant improvements in diagram retrievability, interpretability, and knowledge integration—establishing a novel paradigm for digitizing heterogeneous historical documents. (149 words)

Technology Category

Application Category

📝 Abstract

Diagrams are crucial yet underexplored tools in many disciplines, demonstrating the close connection between visual representation and scholarly reasoning. However, their iconic form poses obstacles to visual studies, intermedial analysis, and text-based digital workflows. In particular, Charles S. Peirce consistently advocated the use of diagrams as essential for reasoning and explanation. His manuscripts, often combining textual content with complex visual artifacts, provide a challenging case for studying documents involving heterogeneous materials. In this preliminary study, we investigate whether Visual Language Models (VLMs) can effectively help us identify and interpret such hybrid pages in context. First, we propose a workflow that (i) segments manuscript page layouts, (ii) reconnects each segment to IIIF-compliant annotations, and (iii) submits fragments containing diagrams to a VLM. In addition, by adopting Peirce's semiotic framework, we designed prompts to extract key knowledge about diagrams and produce concise captions. Finally, we integrated these captions into knowledge graphs, enabling structured representations of diagrammatic content within composite sources.

Problem

Research questions and friction points this paper is trying to address.

Extracting visual knowledge from Peirce's hybrid manuscript pages

Overcoming obstacles in analyzing diagrams within text-based workflows

Interpreting complex visual artifacts using vision-language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Segment manuscript page layouts for analysis

Use Visual Language Models to interpret diagrams

Integrate diagram captions into knowledge graphs

🔎 Similar Papers

No similar papers found.