DRISHTIKON: A Multimodal Multilingual Benchmark for Testing Language Models'Understanding on Indian Culture

📅 2025-09-23

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Existing generative AI cultural understanding benchmarks lack fine-grained coverage of India’s cultural diversity, particularly for low-resource languages and marginalized traditions. Method: We introduce IndiCult—the first multimodal, multilingual evaluation benchmark explicitly designed for Indian culture—comprising 64,000 image-text pairs across 15 languages and all Indian regions, spanning core cultural dimensions including festivals, attire, cuisine, and art. IndiCult features a novel, context-aware evaluation protocol tailored to India’s linguistic and cultural plurality, integrating vision-language models, zero-shot learning, and chain-of-thought reasoning for fair assessment of both open- and closed-source models. Contribution/Results: Experiments reveal pervasive biases in mainstream models on culturally grounded multimodal reasoning, especially for low-resource languages and non-dominant traditions. IndiCult’s rigor and representativeness validate its utility as a challenging, high-fidelity benchmark, establishing critical infrastructure for developing culturally aware AI systems.

Technology Category

Application Category

📝 Abstract

We introduce DRISHTIKON, a first-of-its-kind multimodal and multilingual benchmark centered exclusively on Indian culture, designed to evaluate the cultural understanding of generative AI systems. Unlike existing benchmarks with a generic or global scope, DRISHTIKON offers deep, fine-grained coverage across India's diverse regions, spanning 15 languages, covering all states and union territories, and incorporating over 64,000 aligned text-image pairs. The dataset captures rich cultural themes including festivals, attire, cuisines, art forms, and historical heritage amongst many more. We evaluate a wide range of vision-language models (VLMs), including open-source small and large models, proprietary systems, reasoning-specialized VLMs, and Indic-focused models, across zero-shot and chain-of-thought settings. Our results expose key limitations in current models'ability to reason over culturally grounded, multimodal inputs, particularly for low-resource languages and less-documented traditions. DRISHTIKON fills a vital gap in inclusive AI research, offering a robust testbed to advance culturally aware, multimodally competent language technologies.

Problem

Research questions and friction points this paper is trying to address.

Evaluating AI's cultural understanding of Indian traditions and heritage

Testing multimodal reasoning across 15 Indian languages and regions

Addressing limitations in AI's handling of low-resource cultural content

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal multilingual benchmark for Indian culture

Evaluates vision-language models across diverse cultural themes

Tests models with zero-shot and chain-of-thought settings

🔎 Similar Papers

No similar papers found.