MATA (māta): Mindful Assessment of the Telugu Abilities of Large Language Models

📅 2025-08-19

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work systematically evaluates the linguistic understanding capabilities and limitations of 11 large language models (LLMs) on Telugu, a low-resource language. To this end, we construct a high-quality evaluation benchmark comprising 729 multiple-choice and open-ended questions, and propose a fine-grained assessment framework. Our analysis—integrating human annotation, multidimensional linguistic test design, response behavior analysis, human evaluation, and LLM-as-a-judge methodology—reveals that current LLMs predominantly rely on superficial heuristics (e.g., answer option position) rather than deep semantic comprehension. Results demonstrate significant performance degradation on Telugu tasks, and uncover systematic discrepancies between LLM-based and human evaluations. This study establishes a new benchmark, introduces a novel evaluation methodology, and provides foundational insights for NLP evaluation in low-resource languages.

Technology Category

Application Category

📝 Abstract

In this paper, we introduce MATA, a novel evaluation dataset to assess the ability of Large Language Models (LLMs) in Telugu language, comprising 729 carefully curated multiple-choice and open-ended questions that span diverse linguistic dimensions. We evaluate 11 open-weight and closed-source LLMs on our dataset and present a fine-grained analysis of their performance. Further, we empirically show how LLMs rely on superficial heuristics such as answer position and distractor patterns for multiple-choice questions. Finally, we also compare LLM-as-a-judge evaluation with human evaluation for open-ended questions and draw some conclusions on its reliability in a low-resource language. We argue that such fine-grained evaluation is essential for understanding model limitations and can inform the development of more linguistically capable LLMs, while also serving as a foundation for future research in Telugu NLP.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLM Telugu language abilities across diverse linguistic dimensions

Analyzing LLM reliance on superficial heuristics in multiple-choice questions

Assessing reliability of LLM-as-judge evaluation versus human evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel Telugu evaluation dataset creation

Fine-grained analysis of multiple model performances

Empirical demonstration of heuristic reliance in models

🔎 Similar Papers

No similar papers found.

Authors to Follow