Culturally-Grounded Chain-of-Thought (CG-CoT):Enhancing LLM Performance on Culturally-Specific Tasks in Low-Resource Languages

📅 2025-06-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit poor performance on culture-specific reasoning tasks in low-resource languages—e.g., Yoruba proverb comprehension—hindering their equitable global deployment. To address this, we propose Cultural Chain-of-Thought (C-CoT), a novel prompting framework that integrates dense vector retrieval for localized cultural context acquisition, explicit chain-of-thought reasoning to guide culturally grounded inference, and a dual verification mechanism combining LLM self-assessment with human-in-the-loop validation. Experiments demonstrate substantial gains in cultural alignment accuracy and reasoning depth. Crucially, we uncover a fundamental misalignment between conventional translation metrics (e.g., BLEU) and cultural relevance evaluation, thereby advocating a paradigm shift in low-resource NLP assessment. To our knowledge, this is the first work to systematically unify cultural retrieval, structured reasoning, and multi-tiered validation to enhance LLMs’ cultural intelligence.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) struggle with culturally-specific reasoning tasks, particularly in low-resource languages, hindering their global applicability. Addressing this gap is crucial for equitable AI deployment. We introduce Culturally-Grounded Chain-of-Thought (CG-CoT), a novel prompting strategy that combines dense vector retrieval of cultural context with explicit reasoning sequences. Our extensive experiments on Yoruba proverb interpretation demonstrate that CG-CoT provides significantly higher culturally-aligned accuracy and depth than traditional prompting methods, validated through both automated metrics and LLM-based evaluations. Notably, we uncover stark disparities between token-level translation metrics like BLEU and human-judged cultural relevance, suggesting a rethinking of evaluation approaches for low-resource NLP.
Problem

Research questions and friction points this paper is trying to address.

Improving LLM performance on culturally-specific tasks in low-resource languages
Addressing disparities in cultural relevance evaluation metrics for NLP
Enhancing reasoning accuracy with culturally-grounded prompting strategies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dense vector retrieval of cultural context
Explicit reasoning sequences in prompts
Culturally-aligned accuracy enhancement
🔎 Similar Papers
No similar papers found.