Multilingual Fact-Checking at Scale: Fine-Tuned Compact Models vs LLMs

📅 2026-06-07

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of high-throughput, low-latency fact-checking in multilingual settings by proposing a modular system comprising three stages: claim detection, evidence retrieval and re-ranking, and veracity prediction. Instead of relying on general-purpose large language models, the system employs task-specific fine-tuned models—XLM-RoBERTa-Large for claim detection, mmBERT-base for stance classification, and a SetFit-based multilingual re-ranker—achieving robust and efficient performance across 114 languages for claim detection and 28 languages for veracity prediction. Experimental results demonstrate that this approach significantly outperforms mainstream large language models such as GPT-5.2, Claude Opus 4.6, and Qwen3-8B in terms of inference efficiency, data privacy, and resource consumption, while maintaining comparable or superior accuracy.

📝 Abstract

We present a multilingual fact-checking system deployed at Factiverse, designed for high-throughput and low-latency operation across diverse languages. The system follows a modular pipeline with three stages: claim detection, evidence retrieval and re-ranking, and veracity prediction. We fine-tune XLM-RoBERTa-Large for claim detection, mmBERT-base for three-label stance classification (Supports/Refutes/Mixed), and a SetFit-based multilingual re-ranker for claim--evidence matching. We compare these components against strong LLM baselines, including GPT-5.2, Claude Opus~4.6, and Qwen3-8b. Experiments on production data spanning 114 languages for claim detection and 28 languages for veracity prediction show that task-specific fine-tuning provides strong and stable multilingual performance, while the fine-tuned retrieval model remains competitive with modern proprietary embeddings. Same-hardware latency measurements further show large efficiency gains for encoder-based components, supporting their use in production deployments with tight cost and privacy constraints. Overall, compact fine-tuned, self-hosted models remain a practical and effective foundation for multilingual fact-checking at scale. Code and data used for this study are available at https://github.com/factiverse/factcheck-editor.

Problem

Research questions and friction points this paper is trying to address.

multilingual fact-checking

high-throughput

low-latency

scalability

veracity prediction

Innovation

Methods, ideas, or system contributions that make the work stand out.

multilingual fact-checking

fine-tuned compact models

modular pipeline