🤖 AI Summary
This study addresses the lack of comprehensive evaluation of large language models (LLMs) for Indian languages. We systematically assess 28 LLMs across 22 Indian languages on understanding and generation tasks to identify priority languages suitable for safety benchmarking. Our methodology introduces the first multi-dimensional, cross-model and cross-dataset quantitative analysis—covering linguistic performance, training data provenance, licensing terms, access modalities, and developer affiliations. Results reveal Hindi as the most widely supported language, yet substantial performance disparities exist across others; while the top five languages roughly align with native speaker population size, this correlation breaks down thereafter—exposing a critical misalignment between language coverage and actual user demographics. The work proposes the first holistic evaluation framework tailored to India’s multilingual ecosystem, offering empirical grounding and methodological rigor for equitable language assessment and resource allocation in LLM development.
📝 Abstract
This report evaluates the performance of text-in text-out Large Language Models (LLMs) to understand and generate Indic languages. This evaluation is used to identify and prioritize Indic languages suited for inclusion in safety benchmarks. We conduct this study by reviewing existing evaluation studies and datasets; and a set of twenty-eight LLMs that support Indic languages. We analyze the LLMs on the basis of the training data, license for model and data, type of access and model developers. We also compare Indic language performance across evaluation datasets and find that significant performance disparities in performance across Indic languages. Hindi is the most widely represented language in models. While model performance roughly correlates with number of speakers for the top five languages, the assessment after that varies.