🤖 AI Summary
This study addresses the statement retrieval task in multilingual and cross-lingual fact verification. We propose a zero-shot learning framework that requires no labeled data in target languages: non-English statements are first translated into English, then encoded using an ensemble of multilingual embedding models—NV-Embed-v2, GPT, and Mistral—whose representations are fused to produce robust semantic embeddings; cross-lingual matching is performed via cosine similarity. Our key contribution lies in empirically validating the cross-lingual transferability of large language model embeddings and demonstrating substantial gains from embedding fusion for multilingual retrieval, with NV-Embed-v2 yielding the strongest performance. Evaluated on the FEVEROUS benchmark, our method ranks seventh on the monolingual subtask and ninth on the cross-lingual subtask, confirming the effectiveness and practicality of zero-shot cross-lingual statement retrieval.
📝 Abstract
This paper presents a zero-shot system for fact-checked claim retrieval. We employed several state-of-the-art large language models to obtain text embeddings. The models were then combined to obtain the best possible result. Our approach achieved 7th place in monolingual and 9th in cross-lingual subtasks. We used only English translations as an input to the text embedding models since multilingual models did not achieve satisfactory results. We identified the most relevant claims for each post by leveraging the embeddings and measuring cosine similarity. Overall, the best results were obtained by the NVIDIA NV-Embed-v2 model. For some languages, we benefited from model combinations (NV-Embed & GPT or Mistral).