UWBa at SemEval-2025 Task 7: Multilingual and Crosslingual Fact-Checked Claim Retrieval

📅 2025-08-13

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

This study addresses the statement retrieval task in multilingual and cross-lingual fact verification. We propose a zero-shot learning framework that requires no labeled data in target languages: non-English statements are first translated into English, then encoded using an ensemble of multilingual embedding models—NV-Embed-v2, GPT, and Mistral—whose representations are fused to produce robust semantic embeddings; cross-lingual matching is performed via cosine similarity. Our key contribution lies in empirically validating the cross-lingual transferability of large language model embeddings and demonstrating substantial gains from embedding fusion for multilingual retrieval, with NV-Embed-v2 yielding the strongest performance. Evaluated on the FEVEROUS benchmark, our method ranks seventh on the monolingual subtask and ninth on the cross-lingual subtask, confirming the effectiveness and practicality of zero-shot cross-lingual statement retrieval.

Technology Category

Application Category

📝 Abstract

This paper presents a zero-shot system for fact-checked claim retrieval. We employed several state-of-the-art large language models to obtain text embeddings. The models were then combined to obtain the best possible result. Our approach achieved 7th place in monolingual and 9th in cross-lingual subtasks. We used only English translations as an input to the text embedding models since multilingual models did not achieve satisfactory results. We identified the most relevant claims for each post by leveraging the embeddings and measuring cosine similarity. Overall, the best results were obtained by the NVIDIA NV-Embed-v2 model. For some languages, we benefited from model combinations (NV-Embed & GPT or Mistral).

Problem

Research questions and friction points this paper is trying to address.

Develop zero-shot system for fact-checked claim retrieval

Combine multiple LLMs to improve retrieval accuracy

Evaluate multilingual and crosslingual performance using embeddings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Zero-shot system for fact-checked claim retrieval

Combined state-of-the-art large language models

Leveraged embeddings and cosine similarity

🔎 Similar Papers

No similar papers found.