Homa at SemEval-2025 Task 5: Aligning Librarian Records with OntoAligner for Subject Tagging

📅 2025-04-30

📈 Citations: 0

✨ Influential: 0

career value

155K/year

🤖 AI Summary

This work addresses the automatic subject indexing of German technical literature records from the TIBKAT system, aligning them with the German authority classification scheme Gemeinsame Normdatei (GND). We propose a novel cross-lingual ontology alignment paradigm: for the first time, we adapt the OntoAligner framework to subject indexing by formalizing label assignment as a semantic alignment task between GND concepts and document descriptions. Our approach integrates multilingual semantic embeddings, retrieval-augmented generation (RAG)-enhanced candidate retrieval, and fine-grained similarity matching. Evaluated on SemEval-2025 Task 5, our method achieves significant improvements in GND category matching accuracy. It demonstrates strong robustness on German–English mixed records and high cross-lingual transferability. The framework provides a scalable, language-agnostic solution for automated knowledge organization of multilingual scientific literature.

Technology Category

Application Category

📝 Abstract

This paper presents our system, Homa, for SemEval-2025 Task 5: Subject Tagging, which focuses on automatically assigning subject labels to technical records from TIBKAT using the Gemeinsame Normdatei (GND) taxonomy. We leverage OntoAligner, a modular ontology alignment toolkit, to address this task by integrating retrieval-augmented generation (RAG) techniques. Our approach formulates the subject tagging problem as an alignment task, where records are matched to GND categories based on semantic similarity. We evaluate OntoAligner's adaptability for subject indexing and analyze its effectiveness in handling multilingual records. Experimental results demonstrate the strengths and limitations of this method, highlighting the potential of alignment techniques for improving subject tagging in digital libraries.

Problem

Research questions and friction points this paper is trying to address.

Automatically assigning subject labels to technical records

Aligning records to GND taxonomy using semantic similarity

Evaluating OntoAligner for multilingual subject indexing

Innovation

Methods, ideas, or system contributions that make the work stand out.

OntoAligner toolkit for ontology alignment

Retrieval-augmented generation (RAG) techniques

Semantic similarity for GND category matching

🔎 Similar Papers

No similar papers found.