🤖 AI Summary
This study addresses the lack of culturally and linguistically grounded evaluation benchmarks for Bengali—a medium-resource language—by introducing BLUCK, the first multiple-choice question (MCQ) benchmark specifically designed for indigenous geographical, historical, and linguistic knowledge (2,366 items across 23 thematic categories). Methodologically, we employ human-crafted questions, conduct cross-model benchmarking on nine state-of-the-art LLMs, and perform fine-grained domain attribution analysis. Key contributions include: (1) the first MCQ evaluation framework explicitly aligned with Bengali’s native cultural context and phonological features; (2) the first fine-grained cultural knowledge assessment benchmark for medium-resource languages; and (3) empirical evidence demonstrating significant LLM deficiencies in Bengali phonology and other subdomains, with overall moderate performance—establishing a reproducible, quantitative baseline for low-resource language alignment and culture-aware model enhancement.
📝 Abstract
In this work, we introduce BLUCK, a new dataset designed to measure the performance of Large Language Models (LLMs) in Bengali linguistic understanding and cultural knowledge. Our dataset comprises 2366 multiple-choice questions (MCQs) carefully curated from compiled collections of several college and job level examinations and spans 23 categories covering knowledge on Bangladesh's culture and history and Bengali linguistics. We benchmarked BLUCK using 6 proprietary and 3 open-source LLMs - including GPT-4o, Claude-3.5-Sonnet, Gemini-1.5-Pro, Llama-3.3-70B-Instruct, and DeepSeekV3. Our results show that while these models perform reasonably well overall, they, however, struggles in some areas of Bengali phonetics. Although current LLMs' performance on Bengali cultural and linguistic contexts is still not comparable to that of mainstream languages like English, our results indicate Bengali's status as a mid-resource language. Importantly, BLUCK is also the first MCQ-based evaluation benchmark that is centered around native Bengali culture, history, and linguistics.