🤖 AI Summary
Existing methods for multi-objective alignment of large language models (LLMs)—e.g., instruction following, helpfulness, and conciseness—rely on fixed reward weights or uniform reward averaging, often leading to objective imbalance. This work proposes a *reasoning-time robust cooperative alignment* framework that, for the first time, formulates multi-objective decoding as a maximin game between reward weighting and sampling strategies, solving for the Nash equilibrium to optimize worst-case performance. We further design RMOD, a lightweight variant integrating convex optimization, game-theoretic principles, and controlled decoding to ensure both computational efficiency and inference stability. Empirical evaluation on multi-objective balanced alignment tasks shows up to 20% improvement over strong baselines; generated responses exhibit significantly enhanced consistency in quality across objectives, without sacrificing any individual objective’s performance.
📝 Abstract
Test-time alignment of Large Language Models (LLMs) to human preferences offers a flexible way to generate responses aligned to diverse objectives without extensive retraining of LLMs. Existing methods achieve alignment to multiple objectives simultaneously (e.g., instruction-following, helpfulness, conciseness) by optimizing their corresponding reward functions. However, they often rely on predefined weights or optimize for averages, sacrificing one objective for another and leading to unbalanced outcomes. To address this, we introduce Robust Multi-Objective Decoding (RMOD), a novel inference-time algorithm that optimizes for improving worst-case rewards. RMOD formalizes the robust decoding problem as a maximin two-player game between reward weights and the sampling policy, solving for the Nash equilibrium. We show that the game reduces to a convex optimization problem to find the worst-case weights, while the best response policy can be computed analytically. We also introduce a practical RMOD variant designed for efficient decoding with contemporary LLMs, incurring minimal computational overhead compared to non-robust Multi-Objective Decoding (MOD) methods. Our experimental results showcase the effectiveness of RMOD in generating responses equitably aligned with diverse objectives, outperforming baselines up to 20%.