When AI Writes, Whose Voice Remains? Quantifying Cultural Marker Erasure Across World English Varieties in Large Language Models

📅 2026-02-25

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the systematic erasure of cultural-linguistic markers from non-native English varieties by large language models (LLMs) during workplace text polishing, resulting in linguistic identity loss. Introducing the concept of “cultural ghosting,” the authors propose two novel metrics—Identity Erasure Rate (IER) and Semantic Preservation Score (SPS)—to quantify the extent of cultural erasure in Indian, Singaporean, and Nigerian English texts. Through large-scale generation experiments, cross-variety linguistic analysis, and cultural marker classification, the research reveals an average IER of 10.26%, with pragmatic markers being disproportionately removed compared to lexical ones. A culturally aware prompting strategy reduces erasure by 29% without compromising semantic fidelity, exposing a paradox wherein high semantic preservation coexists with significant cultural erasure.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are increasingly used to ``professionalize'' workplace communication, often at the cost of linguistic identity. We introduce "Cultural Ghosting", the systematic erasure of linguistic markers unique to non-native English varieties during text processing. Through analysis of 22,350 LLM outputs generated from 1,490 culturally marked texts (Indian, Singaporean,& Nigerian English) processed by five models under three prompt conditions, we quantify this phenomenon using two novel metrics: Identity Erasure Rate (IER) & Semantic Preservation Score (SPS). Across all prompts, we find an overall IER of 10.26%, with model-level variation from 3.5% to 20.5% (5.9x range). Crucially, we identify a Semantic Preservation Paradox: models maintain high semantic similarity (mean SPS = 0.748) while systematically erasing cultural markers. Pragmatic markers (politeness conventions) are 1.9x more vulnerable than lexical markers (71.5% vs. 37.1% erasure). Our experiments demonstrate that explicit cultural-preservation prompts reduce erasure by 29% without sacrificing semantic quality.

Problem

Research questions and friction points this paper is trying to address.

Cultural Ghosting

Large Language Models

World English Varieties

Linguistic Identity

Cultural Marker Erasure

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cultural Ghosting

Identity Erasure Rate

Semantic Preservation Score

World English Varieties

Prompt Engineering

🔎 Similar Papers

No similar papers found.

Authors to Follow