🤖 AI Summary
To address redundant fact-checking efforts caused by cross-lingual misinformation dissemination, this paper presents the first systematic study of large language models (LLMs) on multilingual debunked claim detection. We propose a cross-lingual semantic matching framework integrating prompt engineering, zero-/few-shot inference, and machine translation—specifically English-to-Chinese and low-resource-language-to-English translation. Experiments are conducted across seven state-of-the-art LLMs. We introduce the first benchmark dataset covering 20 languages. Results show that translating low-resource language inputs into English significantly improves detection performance (average +18.7%), revealing novel insights into LLMs’ cross-lingual semantic alignment capabilities; high-resource languages achieve high accuracy. This work provides a scalable, language-agnostic technical foundation for global multilingual fact-checking.
📝 Abstract
In our era of widespread false information, human fact-checkers often face the challenge of duplicating efforts when verifying claims that may have already been addressed in other countries or languages. As false information transcends linguistic boundaries, the ability to automatically detect previously fact-checked claims across languages has become an increasingly important task. This paper presents the first comprehensive evaluation of large language models (LLMs) for multilingual previously fact-checked claim detection. We assess seven LLMs across 20 languages in both monolingual and cross-lingual settings. Our results show that while LLMs perform well for high-resource languages, they struggle with low-resource languages. Moreover, translating original texts into English proved to be beneficial for low-resource languages. These findings highlight the potential of LLMs for multilingual previously fact-checked claim detection and provide a foundation for further research on this promising application of LLMs.