UWBa at SemEval-2025 Task 7: Multilingual and Crosslingual Fact-Checked Claim Retrieval

📅 2025-08-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the statement retrieval task in multilingual and cross-lingual fact verification. We propose a zero-shot learning framework that requires no labeled data in target languages: non-English statements are first translated into English, then encoded using an ensemble of multilingual embedding models—NV-Embed-v2, GPT, and Mistral—whose representations are fused to produce robust semantic embeddings; cross-lingual matching is performed via cosine similarity. Our key contribution lies in empirically validating the cross-lingual transferability of large language model embeddings and demonstrating substantial gains from embedding fusion for multilingual retrieval, with NV-Embed-v2 yielding the strongest performance. Evaluated on the FEVEROUS benchmark, our method ranks seventh on the monolingual subtask and ninth on the cross-lingual subtask, confirming the effectiveness and practicality of zero-shot cross-lingual statement retrieval.

Technology Category

Application Category

📝 Abstract
This paper presents a zero-shot system for fact-checked claim retrieval. We employed several state-of-the-art large language models to obtain text embeddings. The models were then combined to obtain the best possible result. Our approach achieved 7th place in monolingual and 9th in cross-lingual subtasks. We used only English translations as an input to the text embedding models since multilingual models did not achieve satisfactory results. We identified the most relevant claims for each post by leveraging the embeddings and measuring cosine similarity. Overall, the best results were obtained by the NVIDIA NV-Embed-v2 model. For some languages, we benefited from model combinations (NV-Embed & GPT or Mistral).
Problem

Research questions and friction points this paper is trying to address.

Develop zero-shot system for fact-checked claim retrieval
Combine multiple LLMs to improve retrieval accuracy
Evaluate multilingual and crosslingual performance using embeddings
Innovation

Methods, ideas, or system contributions that make the work stand out.

Zero-shot system for fact-checked claim retrieval
Combined state-of-the-art large language models
Leveraged embeddings and cosine similarity
🔎 Similar Papers
No similar papers found.
Ladislav Lenc
Ladislav Lenc
University of West Bohemia
D
Daniel Cífka
Department of Computer Science and Engineering, University of West Bohemia in Pilsen; New Technologies for the Information Society, University of West Bohemia in Pilsen
Jiří Martínek
Jiří Martínek
University of West Bohemia
J
Jakub Šmíd
Department of Computer Science and Engineering, University of West Bohemia in Pilsen; New Technologies for the Information Society, University of West Bohemia in Pilsen
P
Pavel Král
Department of Computer Science and Engineering, University of West Bohemia in Pilsen; New Technologies for the Information Society, University of West Bohemia in Pilsen