Mind the Language Gap in Digital Humanities: LLM-Aided Translation of SKOS Thesauri

📅 2025-07-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Linguistic diversity in digital humanities hinders cross-lingual interoperability of SKOS vocabularies, impeding knowledge resource discovery and reuse. To address this, we propose WOKIE—a modular, open-source, plug-and-play SKOS vocabulary translation pipeline. WOKIE innovatively integrates general-purpose translation APIs with large language models (LLMs) via collaborative fine-tuning, enabling high-quality, low-cost, and highly scalable translation without requiring specialized machine translation expertise or high-end hardware. It supports flexible configuration across multiple translation services and LLMs. Evaluated on diverse digital humanities vocabularies spanning 15 languages, WOKIE significantly improves translation accuracy and cross-lingual ontology alignment performance. The framework delivers an efficient, robust, and easily deployable solution for semantic interoperability in multilingual research infrastructures.

Technology Category

Application Category

📝 Abstract
We introduce WOKIE, an open-source, modular, and ready-to-use pipeline for the automated translation of SKOS thesauri. This work addresses a critical need in the Digital Humanities (DH), where language diversity can limit access, reuse, and semantic interoperability of knowledge resources. WOKIE combines external translation services with targeted refinement using Large Language Models (LLMs), balancing translation quality, scalability, and cost. Designed to run on everyday hardware and be easily extended, the application requires no prior expertise in machine translation or LLMs. We evaluate WOKIE across several DH thesauri in 15 languages with different parameters, translation services and LLMs, systematically analysing translation quality, performance, and ontology matching improvements. Our results show that WOKIE is suitable to enhance the accessibility, reuse, and cross-lingual interoperability of thesauri by hurdle-free automated translation and improved ontology matching performance, supporting more inclusive and multilingual research infrastructures.
Problem

Research questions and friction points this paper is trying to address.

Automated translation of SKOS thesauri for Digital Humanities
Addressing language diversity in knowledge resource accessibility
Improving cross-lingual interoperability and ontology matching performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Open-source pipeline for SKOS thesauri translation
Combines translation services with LLM refinement
Runs on everyday hardware, no expertise needed
🔎 Similar Papers
No similar papers found.
F
Felix Kraus
Scientific Computing Center, Karlsruhe Institute of Technology, Karlsruhe, Germany
N
Nicolas Blumenröhr
Scientific Computing Center, Karlsruhe Institute of Technology, Karlsruhe, Germany
D
Danah Tonne
Scientific Computing Center, Karlsruhe Institute of Technology, Karlsruhe, Germany
Achim Streit
Achim Streit
Director of Scientific Computing Center (SCC), Professor for Computer Science, Karlsruhe
computational science and engineeringdistributed systemsgrid computingdata management