Evaluating Monolingual and Multilingual Large Language Models for Greek Question Answering: The DemosQA Benchmark

📅 2026-02-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited performance of current large language models on question answering in low-resource languages such as Greek and the lack of systematic comparisons between monolingual and multilingual models on culturally contextualized tasks. The authors introduce DemosQA, the first Greek question-answering benchmark that integrates sociocultural context, constructed from social media data and validated through community review. They also propose a memory-efficient, scalable evaluation framework and conduct a comprehensive assessment of 11 monolingual and multilingual large language models across six Greek datasets. Results demonstrate that monolingual models exhibit significant advantages on tasks requiring deep cultural contextual understanding. The study releases both data and code to establish a new benchmark and provide practical tools for advancing question-answering research in low-resource languages.

Technology Category

Application Category

📝 Abstract
Recent advancements in Natural Language Processing and Deep Learning have enabled the development of Large Language Models (LLMs), which have significantly advanced the state-of-the-art across a wide range of tasks, including Question Answering (QA). Despite these advancements, research on LLMs has primarily targeted high-resourced languages (e.g., English), and only recently has attention shifted toward multilingual models. However, these models demonstrate a training data bias towards a small number of popular languages or rely on transfer learning from high- to under-resourced languages; this may lead to a misrepresentation of social, cultural, and historical aspects. To address this challenge, monolingual LLMs have been developed for under-resourced languages; however, their effectiveness remains less studied when compared to multilingual counterparts on language-specific tasks. In this study, we address this research gap in Greek QA by contributing: (i) DemosQA, a novel dataset, which is constructed using social media user questions and community-reviewed answers to better capture the Greek social and cultural zeitgeist; (ii) a memory-efficient LLM evaluation framework adaptable to diverse QA datasets and languages; and (iii) an extensive evaluation of 11 monolingual and multilingual LLMs on 6 human-curated Greek QA datasets using 3 different prompting strategies. We release our code and data to facilitate reproducibility.
Problem

Research questions and friction points this paper is trying to address.

Low-resource languages
Question Answering
Language bias
Monolingual LLMs
Multilingual LLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

monolingual LLMs
multilingual LLMs
Greek question answering
DemosQA benchmark
efficient evaluation framework
🔎 Similar Papers
No similar papers found.