🤖 AI Summary
Large language models (LLMs) suffer from weak cultural representativeness and insufficient content diversity, leading to culture-related hallucinations. To address this, we propose a multilingual prompt engineering framework: starting from a base prompt, we construct cross-cultural variants that integrate multilingual and multicultural cues; responses are generated via high-temperature sampling, role prompting, and stepwise recall; and outputs are aggregated using a weighted fusion mechanism. Our work is the first to systematically demonstrate that LLMs’ implicit cultural knowledge can be activated via language-specific prompts, and that alignment between prompt language and cultural cues significantly mitigates cultural hallucinations. Experiments on GPT-4o, GPT-4o-mini, and LLaMA-70B/8B show consistent improvements over existing diversity-enhancement baselines. Performance gains correlate with both linguistic resource richness and model scale, highlighting the interplay between multilingual prompting and architectural capacity.
📝 Abstract
Large Language Models (LLMs) are known to lack cultural representation and overall diversity in their generations, from expressing opinions to answering factual questions. To mitigate this problem, we propose multilingual prompting: a prompting method which generates several variations of a base prompt with added cultural and linguistic cues from several cultures, generates responses, and then combines the results. Building on evidence that LLMs have language-specific knowledge, multilingual prompting seeks to increase diversity by activating a broader range of cultural knowledge embedded in model training data. Through experiments across multiple models (GPT-4o, GPT-4o-mini, LLaMA 70B, and LLaMA 8B), we show that multilingual prompting consistently outperforms existing diversity-enhancing techniques such as high-temperature sampling, step-by-step recall, and personas prompting. Further analyses show that the benefits of multilingual prompting vary with language resource level and model size, and that aligning the prompting language with the cultural cues reduces hallucination about culturally-specific information.