FaMTEB: Massive Text Embedding Benchmark in Persian Language

📅 2025-02-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The absence of systematic evaluation benchmarks for Persian text embeddings hinders reproducible and standardized assessment. Method: This paper introduces FaMTEB—the first large-scale, multi-task, Persian-specific benchmark—built upon an extended MTEB framework. It integrates 63 datasets across seven task categories: classification, clustering, retrieval, re-ranking, summary retrieval (a newly proposed task), chatbot evaluation (the first such inclusion in the MTEB ecosystem), and paraphrase identification. Data sources include translation, synthetic generation, and original Persian content, featuring multiple high-quality, publicly released Persian NLP datasets. Contribution/Results: FaMTEB is fully open-source, providing standardized evaluation code, unified protocols, and a public leaderboard. Comprehensive experiments evaluate over ten Persian-specific and multilingual embedding models, substantially improving reproducibility and standardization in Persian text representation evaluation.

Technology Category

Application Category

📝 Abstract
In this paper, we introduce a comprehensive benchmark for Persian (Farsi) text embeddings, built upon the Massive Text Embedding Benchmark (MTEB). Our benchmark includes 63 datasets spanning seven different tasks: classification, clustering, pair classification, reranking, retrieval, summary retrieval, and semantic textual similarity. The datasets are formed as a combination of existing, translated, and newly generated data, offering a diverse evaluation framework for Persian language models. Given the increasing use of text embedding models in chatbots, evaluation datasets are becoming inseparable ingredients in chatbot challenges and Retrieval-Augmented Generation systems. As a contribution, we include chatbot evaluation datasets in the MTEB benchmark for the first time. In addition, in this paper, we introduce the new task of summary retrieval which is not part of the tasks included in standard MTEB. Another contribution of this paper is the introduction of a substantial number of new Persian language NLP datasets suitable for training and evaluation, some of which have no previous counterparts in Persian. We evaluate the performance of several Persian and multilingual embedding models in a range of tasks. This work introduces an open-source benchmark with datasets, code and a public leaderboard.
Problem

Research questions and friction points this paper is trying to address.

Develops Persian text embedding benchmark
Introduces new summary retrieval task
Expands Persian NLP datasets collection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Comprehensive Persian text embeddings benchmark
Includes chatbot evaluation datasets
Introduces summary retrieval task
🔎 Similar Papers
No similar papers found.
E
Erfan Zinvandi
Sharif University of Technology
M
Morteza Alikhani
Sharif University of Technology
M
Mehran Sarmadi
Sharif University of Technology
Zahra Pourbahman
Zahra Pourbahman
Postdoctoral Researcher, Computer Engineering Department, Sharif University of Technology
LLM EvaluationNeural Information RetrievalNatural Language ProcessingData Mining
S
Sepehr Arvin
MCINext
R
Reza Kazemi
Sharif University of Technology
A
Arash Amini
Sharif University of Technology