๐ค AI Summary
This work addresses the automatic subject indexing of German technical literature records from the TIBKAT system, aligning them with the German authority classification scheme Gemeinsame Normdatei (GND). We propose a novel cross-lingual ontology alignment paradigm: for the first time, we adapt the OntoAligner framework to subject indexing by formalizing label assignment as a semantic alignment task between GND concepts and document descriptions. Our approach integrates multilingual semantic embeddings, retrieval-augmented generation (RAG)-enhanced candidate retrieval, and fine-grained similarity matching. Evaluated on SemEval-2025 Task 5, our method achieves significant improvements in GND category matching accuracy. It demonstrates strong robustness on GermanโEnglish mixed records and high cross-lingual transferability. The framework provides a scalable, language-agnostic solution for automated knowledge organization of multilingual scientific literature.
๐ Abstract
This paper presents our system, Homa, for SemEval-2025 Task 5: Subject Tagging, which focuses on automatically assigning subject labels to technical records from TIBKAT using the Gemeinsame Normdatei (GND) taxonomy. We leverage OntoAligner, a modular ontology alignment toolkit, to address this task by integrating retrieval-augmented generation (RAG) techniques. Our approach formulates the subject tagging problem as an alignment task, where records are matched to GND categories based on semantic similarity. We evaluate OntoAligner's adaptability for subject indexing and analyze its effectiveness in handling multilingual records. Experimental results demonstrate the strengths and limitations of this method, highlighting the potential of alignment techniques for improving subject tagging in digital libraries.