🤖 AI Summary
Existing generative AI cultural understanding benchmarks lack fine-grained coverage of India’s cultural diversity, particularly for low-resource languages and marginalized traditions.
Method: We introduce IndiCult—the first multimodal, multilingual evaluation benchmark explicitly designed for Indian culture—comprising 64,000 image-text pairs across 15 languages and all Indian regions, spanning core cultural dimensions including festivals, attire, cuisine, and art. IndiCult features a novel, context-aware evaluation protocol tailored to India’s linguistic and cultural plurality, integrating vision-language models, zero-shot learning, and chain-of-thought reasoning for fair assessment of both open- and closed-source models.
Contribution/Results: Experiments reveal pervasive biases in mainstream models on culturally grounded multimodal reasoning, especially for low-resource languages and non-dominant traditions. IndiCult’s rigor and representativeness validate its utility as a challenging, high-fidelity benchmark, establishing critical infrastructure for developing culturally aware AI systems.
📝 Abstract
We introduce DRISHTIKON, a first-of-its-kind multimodal and multilingual benchmark centered exclusively on Indian culture, designed to evaluate the cultural understanding of generative AI systems. Unlike existing benchmarks with a generic or global scope, DRISHTIKON offers deep, fine-grained coverage across India's diverse regions, spanning 15 languages, covering all states and union territories, and incorporating over 64,000 aligned text-image pairs. The dataset captures rich cultural themes including festivals, attire, cuisines, art forms, and historical heritage amongst many more. We evaluate a wide range of vision-language models (VLMs), including open-source small and large models, proprietary systems, reasoning-specialized VLMs, and Indic-focused models, across zero-shot and chain-of-thought settings. Our results expose key limitations in current models'ability to reason over culturally grounded, multimodal inputs, particularly for low-resource languages and less-documented traditions. DRISHTIKON fills a vital gap in inclusive AI research, offering a robust testbed to advance culturally aware, multimodally competent language technologies.