🤖 AI Summary
Large language models (LLMs) exhibit poor performance on culture-specific reasoning tasks in low-resource languages—e.g., Yoruba proverb comprehension—hindering their equitable global deployment. To address this, we propose Cultural Chain-of-Thought (C-CoT), a novel prompting framework that integrates dense vector retrieval for localized cultural context acquisition, explicit chain-of-thought reasoning to guide culturally grounded inference, and a dual verification mechanism combining LLM self-assessment with human-in-the-loop validation. Experiments demonstrate substantial gains in cultural alignment accuracy and reasoning depth. Crucially, we uncover a fundamental misalignment between conventional translation metrics (e.g., BLEU) and cultural relevance evaluation, thereby advocating a paradigm shift in low-resource NLP assessment. To our knowledge, this is the first work to systematically unify cultural retrieval, structured reasoning, and multi-tiered validation to enhance LLMs’ cultural intelligence.
📝 Abstract
Large Language Models (LLMs) struggle with culturally-specific reasoning tasks, particularly in low-resource languages, hindering their global applicability. Addressing this gap is crucial for equitable AI deployment. We introduce Culturally-Grounded Chain-of-Thought (CG-CoT), a novel prompting strategy that combines dense vector retrieval of cultural context with explicit reasoning sequences. Our extensive experiments on Yoruba proverb interpretation demonstrate that CG-CoT provides significantly higher culturally-aligned accuracy and depth than traditional prompting methods, validated through both automated metrics and LLM-based evaluations. Notably, we uncover stark disparities between token-level translation metrics like BLEU and human-judged cultural relevance, suggesting a rethinking of evaluation approaches for low-resource NLP.