TriAlign: Towards Universal Truth Consistency in Personalized LLM Alignment

📅 2026-06-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

176K/year
🤖 AI Summary
This work addresses the tension between personalization and fairness in large language models, where adapting to individual user preferences may compromise consistency and equity across social groups on objective factual tasks. To mitigate this issue, the authors propose Truth-Invariant Alignment (TIA), a novel alignment objective that preserves universal factual consistency while maintaining personalization capabilities. They introduce TriAlign, the first offline multi-agent reinforcement learning framework designed for TIA, which models distinct social groups as interacting agents and incorporates a fairness-aware optimization objective alongside an explicit inconsistency penalty. Experimental results demonstrate that TriAlign significantly reduces inter-group disparities in factual responses while simultaneously improving performance on objective tasks and retaining high-quality personalization, outperforming strong existing baselines.
📝 Abstract
Personalized large language models adapt responses to users' preferences and social attributes, but can introduce substantial universal truth inconsistencies across social groups, where some groups systematically receive less accurate responses on objective tasks. Existing alignment methods either ignore personalization or mainly focus on subjective preference alignment, largely overlooking fairness and consistency in universal truths. To address this gap, we study Truth-Invariant Alignment (TIA), an alignment problem for personalized LLMs that aims to ensure universal truths remain consistent across social groups while preserving personalization. We propose TriAlign, the first offline multi-agent reinforcement learning (MARL) framework for TIA, where each social group is modeled as an agent interacting. TriAlign jointly optimizes universal truth accuracy, cross-group truth consistency, and personalization through a fairness-aware objective and an explicit inconsistency penalty. Experiments across diverse benchmarks demonstrate that TriAlign achieves a stronger balance among these three objectives than strong baselines, reducing universal truth disparities across social groups while improving both objective task performance and personalization quality.
Problem

Research questions and friction points this paper is trying to address.

universal truth consistency
personalized LLM alignment
fairness
social groups
truth inconsistency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Truth-Invariant Alignment
Personalized LLMs
Multi-Agent Reinforcement Learning
Fairness in AI
Cross-Group Consistency
🔎 Similar Papers
No similar papers found.