Coherence Maximization Improves Pluralistic Alignment

📅 2026-06-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

206K/year
🤖 AI Summary
This work addresses the challenge of aligning artificial intelligence with diverse human values in the absence of human supervision. It proposes Internal Consistency Maximization (ICM), a method that automatically generates context-specific exemplars reflecting the values of target demographic groups by optimizing mutual predictability among labels, thereby eliminating the need for manual annotation. Treating internal consistency as a core design principle for value alignment, ICM demonstrates effectiveness across classification, preference modeling, and open-ended generation tasks. Experiments show that ICM-generated exemplars achieve performance on par with ground-truth labels across four benchmarks. Furthermore, for demographic groups underrepresented in pretraining data, integrating uncertainty-guided human feedback substantially enhances model generalization.
📝 Abstract
Aligning AI systems with diverse human values requires value specifications grounded in concrete examples, but generating such examples without extensive human supervision remains an open challenge. We investigate what makes these examples effective, using Internal Coherence Maximization (ICM) -- which infers labels by maximizing their mutual predictability -- to generate persona-specific examples that steer a model toward a target group's values, without human supervision. Across four benchmarks spanning classification, preference, and open-ended generation, ICM-inferred in-context examples match the performance of gold labels. Crucially, coherence matters beyond individual label accuracy: with accuracy held constant, more coherent examples generalize substantially better than incoherent ones. For personas underrepresented in pretraining data, targeted human feedback on the questions where the model is least certain about a persona's values yields better generalization than the same number of labels on arbitrary questions. These results identify coherence as a key design principle for scalable value specification, leveraging the diverse human perspectives already encoded in pretrained language models.
Problem

Research questions and friction points this paper is trying to address.

value alignment
pluralistic values
example generation
human supervision
coherence
Innovation

Methods, ideas, or system contributions that make the work stand out.

Internal Coherence Maximization
value alignment
in-context learning
scalable supervision
persona-specific generalization
🔎 Similar Papers
No similar papers found.