PluralLLM: Pluralistic Alignment in LLMs via Federated Learning

📅 2025-03-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing three critical challenges in LLM alignment—privacy leakage, centralized dependency, and group unfairness—this paper proposes the first federated preference learning framework for LLM alignment. Our method employs distributed reward modeling, leveraging Federated Averaging (FedAvg) to collaboratively train a Transformer-based preference predictor across decentralized user groups without sharing raw human feedback data, thereby preserving privacy and ensuring fairness. Theoretical analysis and empirical evaluation on question-answering preference tasks demonstrate that our approach achieves 46% faster convergence and 4% higher alignment accuracy than centralized RLHF, while attaining group fairness comparable to centralized baselines—all under formal differential privacy guarantees and efficient communication overhead. The core contribution lies in the systematic integration of federated learning into LLM preference alignment, unifying privacy-preserving computation, decentralization, and fair, scalable alignment.

Technology Category

Application Category

📝 Abstract
Ensuring Large Language Models (LLMs) align with diverse human preferences while preserving privacy and fairness remains a challenge. Existing methods, such as Reinforcement Learning from Human Feedback (RLHF), rely on centralized data collection, making them computationally expensive and privacy-invasive. We introduce PluralLLM a federated learning-based approach that enables multiple user groups to collaboratively train a transformer-based preference predictor without sharing sensitive data, which can also serve as a reward model for aligning LLMs. Our method leverages Federated Averaging (FedAvg) to aggregate preference updates efficiently, achieving 46% faster convergence, a 4% improvement in alignment scores, and nearly the same group fairness measure as in centralized training. Evaluated on a Q/A preference alignment task, PluralLLM demonstrates that federated preference learning offers a scalable and privacy-preserving alternative for aligning LLMs with diverse human values.
Problem

Research questions and friction points this paper is trying to address.

Align LLMs with diverse human preferences
Preserve privacy and fairness in LLM training
Enable collaborative training without sharing sensitive data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated learning for privacy-preserving LLM alignment
Federated Averaging for efficient preference aggregation
Transformer-based predictor for diverse human preferences
🔎 Similar Papers
No similar papers found.
M
Mahmoud Srewa
University of California, Irvine
T
Tianyu Zhao
University of California, Irvine
Salma Elmalaki
Salma Elmalaki
EECS Department at University of California, Irvine
Human FactorsCPSMobile ComputingExtended Reality