CuMA: Aligning LLMs with Sparse Cultural Values via Demographic-Aware Mixture of Adapters

πŸ“… 2026-01-08
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of aligning large language models with diverse global user values, where cultural sparsity often leads to β€œmean collapse,” undermining the representation of cultural plurality. To mitigate this, the authors formulate cultural alignment as a conditional capacity disentanglement problem and propose the Cultural Mixture-of-Adapters (CuMA) framework. CuMA explicitly handles cultural sparsity and gradient interference through a demographic-aware mixture-of-experts architecture, latent cultural topology modeling, and gradient-decoupled training, enabling disentangled value alignment. Evaluated on the WorldValuesBench, Community Alignment, and PRISM benchmarks, CuMA achieves state-of-the-art performance, significantly outperforming dense models and semantics-driven MoE approaches while effectively preserving cultural diversity.

Technology Category

Application Category

πŸ“ Abstract
As Large Language Models (LLMs) serve a global audience, alignment must transition from enforcing universal consensus to respecting cultural pluralism. We demonstrate that dense models, when forced to fit conflicting value distributions, suffer from \textbf{Mean Collapse}, converging to a generic average that fails to represent diverse groups. We attribute this to \textbf{Cultural Sparsity}, where gradient interference prevents dense parameters from spanning distinct cultural modes. To resolve this, we propose \textbf{\textsc{CuMA}} (\textbf{Cu}ltural \textbf{M}ixture of \textbf{A}dapters), a framework that frames alignment as a \textbf{conditional capacity separation} problem. By incorporating demographic-aware routing, \textsc{CuMA} internalizes a \textit{Latent Cultural Topology} to explicitly disentangle conflicting gradients into specialized expert subspaces. Extensive evaluations on WorldValuesBench, Community Alignment, and PRISM demonstrate that \textsc{CuMA} achieves state-of-the-art performance, significantly outperforming both dense baselines and semantic-only MoEs. Crucially, our analysis confirms that \textsc{CuMA} effectively mitigates mean collapse, preserving cultural diversity. Our code is available at https://github.com/Throll/CuMA.
Problem

Research questions and friction points this paper is trying to address.

Mean Collapse
Cultural Sparsity
Cultural Pluralism
Value Alignment
Large Language Models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cultural Sparsity
Mean Collapse
Mixture of Adapters
Demographic-Aware Routing
Latent Cultural Topology
πŸ”Ž Similar Papers
No similar papers found.
A
Ao Sun
Southeast University
X
Xiaoyu Wang
Southeast University
Z
Zhe Tan
Southeast University
Yu Li
Yu Li
Southeast University, Monash University
Natural Language ProcessingLarge Language Models
J
Jiachen Zhu
ByteDance Inc.
S
Shu Su
Southeast University
Y
Yuheng Jia
Southeast University