🤖 AI Summary
Linguistic diversity in digital humanities hinders cross-lingual interoperability of SKOS vocabularies, impeding knowledge resource discovery and reuse. To address this, we propose WOKIE—a modular, open-source, plug-and-play SKOS vocabulary translation pipeline. WOKIE innovatively integrates general-purpose translation APIs with large language models (LLMs) via collaborative fine-tuning, enabling high-quality, low-cost, and highly scalable translation without requiring specialized machine translation expertise or high-end hardware. It supports flexible configuration across multiple translation services and LLMs. Evaluated on diverse digital humanities vocabularies spanning 15 languages, WOKIE significantly improves translation accuracy and cross-lingual ontology alignment performance. The framework delivers an efficient, robust, and easily deployable solution for semantic interoperability in multilingual research infrastructures.
📝 Abstract
We introduce WOKIE, an open-source, modular, and ready-to-use pipeline for the automated translation of SKOS thesauri. This work addresses a critical need in the Digital Humanities (DH), where language diversity can limit access, reuse, and semantic interoperability of knowledge resources. WOKIE combines external translation services with targeted refinement using Large Language Models (LLMs), balancing translation quality, scalability, and cost. Designed to run on everyday hardware and be easily extended, the application requires no prior expertise in machine translation or LLMs. We evaluate WOKIE across several DH thesauri in 15 languages with different parameters, translation services and LLMs, systematically analysing translation quality, performance, and ontology matching improvements. Our results show that WOKIE is suitable to enhance the accessibility, reuse, and cross-lingual interoperability of thesauri by hurdle-free automated translation and improved ontology matching performance, supporting more inclusive and multilingual research infrastructures.