CARTE: A Benchmark for Mapping Language Model Knowledge Across France

πŸ“… 2026-06-01
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

171K/year
πŸ€– AI Summary
This study addresses the lack of fine-grained evaluation benchmarks for large language models (LLMs) that capture intra-national regional variation, particularly within France. To this end, we introduce CARTE, the first multiple-choice benchmark encompassing 13 metropolitan regions of France across 14 thematic domains, along with a dedicated subset, CARTE-LV, designed to assess dialectal differences. CARTE establishes a systematic framework for evaluating LLMs on region-specific knowledge spanning cultural, linguistic, demographic, and economic dimensions. We evaluate 27 LLMs of varying scales under few-shot settings and find significant performance disparities across regions, revealing insufficient coverage of regional knowledge in current pretraining corpora and limited robustness to domestic regional variation.
πŸ“ Abstract
We introduce CARTE 1 (Culturally Anchored Regional-Territorial Evaluation), a multiplechoice benchmark for evaluating the ability of large language models (LLMs) to perform fine-grained reasoning over geographically grounded and regionally differentiated knowledge within France. While prior benchmarks focus on national-level cultural understanding, they largely overlook intra-country variation and the need to distinguish between closely related regional contexts. CARTE addresses this gap by introducing 2,431 questions spanning the 13 metropolitan regions of France and covering 14 thematic domains, including culture, language, demographics, economy, environment, and mobility. We further introduce CARTE-LV, a subset targeting Linguistic Variation across French regions, enabling focused evaluation of language-related differences. We evaluate 27 LLMs ranging from 1B to 12B parameters under few-shot settings. Our experiments reveal performance disparities across regions and model scales, suggesting systematic gaps in pretraining coverage and limited robustness to intra-national variation.
Problem

Research questions and friction points this paper is trying to address.

geographically grounded knowledge
regional variation
cultural understanding
intra-country variation
language models evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

geographically grounded evaluation
regional cultural knowledge
linguistic variation
intra-national benchmark
large language models
πŸ”Ž Similar Papers
No similar papers found.