PICACO: Pluralistic In-Context Value Alignment of LLMs via Total Correlation Optimization

📅 2025-07-22

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Existing value alignment methods for in-context learning (ICA) suffer from an “instruction bottleneck”: a single prompt cannot adequately reconcile diverse—or even conflicting—values (e.g., stimulation vs. tradition), resulting in incomplete or biased alignment. To address this, we propose a fine-tuning-free contextual alignment framework centered on a meta-instruction optimization scheme that maximizes total correlation, explicitly modeling and strengthening multidimensional associations between specified values and model outputs. Our approach is applicable to both black-box and open-weight large language models and supports the simultaneous balancing of up to eight distinct values. Extensive experiments across five diverse, multi-value benchmark sets demonstrate that our method significantly outperforms multiple state-of-the-art baselines, achieving superior depth in value understanding and greater response balance.

Technology Category

Application Category

📝 Abstract

In-Context Learning has shown great potential for aligning Large Language Models (LLMs) with human values, helping reduce harmful outputs and accommodate diverse preferences without costly post-training, known as In-Context Alignment (ICA). However, LLMs' comprehension of input prompts remains agnostic, limiting ICA's ability to address value tensions--human values are inherently pluralistic, often imposing conflicting demands, e.g., stimulation vs. tradition. Current ICA methods therefore face the Instruction Bottleneck challenge, where LLMs struggle to reconcile multiple intended values within a single prompt, leading to incomplete or biased alignment. To address this, we propose PICACO, a novel pluralistic ICA method. Without fine-tuning, PICACO optimizes a meta-instruction that navigates multiple values to better elicit LLMs' understanding of them and improve their alignment. This is achieved by maximizing the total correlation between specified values and LLM responses, theoretically reinforcing value correlation while reducing distractive noise, resulting in effective value instructions. Extensive experiments on five value sets show that PICACO works well with both black-box and open-source LLMs, outperforms several recent strong baselines, and achieves a better balance across up to 8 distinct values.

Problem

Research questions and friction points this paper is trying to address.

Address value tensions in pluralistic human values

Overcome Instruction Bottleneck in In-Context Alignment

Improve alignment of LLMs with diverse preferences

Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimizes meta-instruction for pluralistic value alignment

Maximizes total correlation between values and responses

Works with black-box and open-source LLMs effectively

🔎 Similar Papers

Self-Alignment: Improving Alignment of Cultural Values in LLMs via In-Context Learning