Differential Voting: Loss Functions For Axiomatically Diverse Aggregation of Heterogeneous Preferences

📅 2026-01-25

📈 Citations: 1

✨ Influential: 0

🤖 AI Summary

This work addresses a critical limitation in current reinforcement learning from human feedback (RLHF) methods, which implicitly aggregate heterogeneous preferences without explicit control over social choice axioms, leading to opaque normative assumptions in the learned reward functions. To remedy this, the authors propose the Differential Voting framework, which for the first time reformulates classical voting rules—such as Copeland and Kemeny—as instance-level differentiable loss functions, ensuring that the optimization objective precisely aligns with a specified voting mechanism at the population level. Through consistency analysis, gradient field modeling, and asymptotic studies of smoothing parameters, the paper systematically uncovers the geometric structure of these losses and their correspondence to foundational social choice axioms, enabling principled axiom-based trade-offs in RLHF. Experiments confirm the alignment between the proposed method and its target voting rules, and the implementation is publicly released.

Technology Category

Application Category

📝 Abstract

Reinforcement learning from human feedback (RLHF) implicitly aggregates heterogeneous human preferences into a single utility function, even though the underlying utilities of the participants are in practice diverse. Hence, RLHF can be viewed as a form of voting, where the aggregation mechanism is defined by the loss function. Although Arrow's Impossibility Theorem suggests that different mechanisms satisfy different sets of desirable axioms, most existing methods rely on a single aggregation principle, typically the Bradley-Terry-Luce (BTL) model, which corresponds to Borda count voting. This restricts the axiomatic properties of the learned reward and obscures the normative assumptions embedded in optimization. In this work, we introduce Differential Voting, a unifying framework that constructs instance-wise, differentiable loss functions whose population-level optima provably correspond to distinct classical voting rules. We develop differentiable surrogates for majority-based aggregation (BTL), Copeland, and Kemeny rules, and formally analyze their calibration properties, gradient fields, and limiting behavior as smoothing parameters vanish. For each loss, we establish consistency with the corresponding social choice rule and characterize the axioms it satisfies or violates. Our analysis shows how design choices in loss geometry-such as margin sensitivity and boundary concentration-directly translate into normative aggregation behavior. Differential Voting makes preference aggregation an explicit and controllable design choice in RLHF, enabling principled trade-offs between axiomatic guarantees and optimization stability. Code to reproduce our experiments is open-sourced.

Problem

Research questions and friction points this paper is trying to address.

preference aggregation

reinforcement learning from human feedback

voting rules

axiomatic diversity

heterogeneous preferences

Innovation

Methods, ideas, or system contributions that make the work stand out.

Differential Voting

Preference Aggregation

Differentiable Loss Functions

Social Choice Theory

Reinforcement Learning from Human Feedback

🔎 Similar Papers

No similar papers found.

Authors to Follow