Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts?

📅 2025-03-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates how cultural context absence impacts the robustness of large language models (LLMs) in mathematical reasoning. Addressing cultural representation bias in mainstream training data, we construct six controlled cultural rewriting datasets derived from GSM8K, systematically substituting culturally specific elements—such as names, locations, and foods—while preserving logical structure. Using zero-shot mathematical reasoning evaluation across model scales (from small to large), we first identify cultural context as an implicit interference source: average performance drops by 12.7%, with small models suffering up to 23.4% degradation. Crucially, we find that cultural familiarity can substantially compensate for limited mathematical capability—some small models even outperform larger ones in culturally aligned settings. These findings challenge the “scale implies robustness” hypothesis and establish cultural prior knowledge as a critical latent variable governing LLM mathematical reasoning performance.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have significantly advanced various fields, particularly coding, mathematical reasoning, and logical problem solving. However, a critical question remains: Do these mathematical reasoning abilities persist when LLMs are presented with culturally adapted math problems? Specifically, how do LLMs perform when faced with math problems embedded in cultural contexts that have no significant representation in main stream web-scale AI training data? To explore this, we generated six synthetic cultural datasets from GSM8K, a widely used benchmark for assessing LLMs' mathematical reasoning skills. While preserving the mathematical logic and numerical values of the original GSM8K test set, we modify cultural elements such as personal names, food items, place names, etc. These culturally adapted datasets provide a more reliable framework for evaluating LLMs' mathematical reasoning under shifting cultural contexts. Our findings reveal that LLMs struggle with math problems when cultural references change, even though the underlying mathematical structure remains constant. Smaller models exhibit greater performance drops compared to larger models. Interestingly, our results also suggest that cultural familiarity can enhance mathematical reasoning. Even models with no explicit mathematical training but exposure to relevant cultural contexts sometimes outperform larger, mathematically proficient models on culturally embedded math problems. This study highlights the impact of cultural context on the mathematical reasoning abilities of LLMs, underscoring the need for more diverse and representative training data to improve robustness in real-world applications. The benchmark data sets and script for reproducing the results are available at https://github.com/akarim23131/Lost_in_Cultural_Translation
Problem

Research questions and friction points this paper is trying to address.

Assessing LLMs' math performance in diverse cultural contexts
Exploring cultural adaptation's impact on LLM reasoning accuracy
Identifying training data gaps for robust real-world applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generated synthetic cultural datasets from GSM8K
Modified cultural elements while preserving math logic
Evaluated LLMs' performance across cultural contexts
🔎 Similar Papers
No similar papers found.