DRISHTIKON: A Multimodal Multilingual Benchmark for Testing Language Models'Understanding on Indian Culture

📅 2025-09-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing generative AI cultural understanding benchmarks lack fine-grained coverage of India’s cultural diversity, particularly for low-resource languages and marginalized traditions. Method: We introduce IndiCult—the first multimodal, multilingual evaluation benchmark explicitly designed for Indian culture—comprising 64,000 image-text pairs across 15 languages and all Indian regions, spanning core cultural dimensions including festivals, attire, cuisine, and art. IndiCult features a novel, context-aware evaluation protocol tailored to India’s linguistic and cultural plurality, integrating vision-language models, zero-shot learning, and chain-of-thought reasoning for fair assessment of both open- and closed-source models. Contribution/Results: Experiments reveal pervasive biases in mainstream models on culturally grounded multimodal reasoning, especially for low-resource languages and non-dominant traditions. IndiCult’s rigor and representativeness validate its utility as a challenging, high-fidelity benchmark, establishing critical infrastructure for developing culturally aware AI systems.

Technology Category

Application Category

📝 Abstract
We introduce DRISHTIKON, a first-of-its-kind multimodal and multilingual benchmark centered exclusively on Indian culture, designed to evaluate the cultural understanding of generative AI systems. Unlike existing benchmarks with a generic or global scope, DRISHTIKON offers deep, fine-grained coverage across India's diverse regions, spanning 15 languages, covering all states and union territories, and incorporating over 64,000 aligned text-image pairs. The dataset captures rich cultural themes including festivals, attire, cuisines, art forms, and historical heritage amongst many more. We evaluate a wide range of vision-language models (VLMs), including open-source small and large models, proprietary systems, reasoning-specialized VLMs, and Indic-focused models, across zero-shot and chain-of-thought settings. Our results expose key limitations in current models'ability to reason over culturally grounded, multimodal inputs, particularly for low-resource languages and less-documented traditions. DRISHTIKON fills a vital gap in inclusive AI research, offering a robust testbed to advance culturally aware, multimodally competent language technologies.
Problem

Research questions and friction points this paper is trying to address.

Evaluating AI's cultural understanding of Indian traditions and heritage
Testing multimodal reasoning across 15 Indian languages and regions
Addressing limitations in AI's handling of low-resource cultural content
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal multilingual benchmark for Indian culture
Evaluates vision-language models across diverse cultural themes
Tests models with zero-shot and chain-of-thought settings
🔎 Similar Papers
No similar papers found.
A
Arijit Maji
Indian Institute of Technology Patna, India
R
Raghvendra Kumar
Indian Institute of Technology Patna, India
A
Akash Ghosh
Indian Institute of Technology Patna, India
A
Anushka
Banasthali Vidyapeeth University, Rajasthan, India
N
Nemil Shah
Pandit Deendayal Energy University, India
Abhilekh Borah
Abhilekh Borah
Undergraduate Student, Manipal University
Multimodal AITrustworthy AIAI Alignment
V
Vanshika Shah
Dwarkadas J. Sanghvi College of Engineering, India
N
Nishant Mishra
Indian Institute of Technology Patna, India
S
Sriparna Saha
Indian Institute of Technology Patna, India