UFO: Unfair-to-Fair Evolving Mitigates Unfairness in LLM-based Recommender Systems via Self-Play Fine-tuning

📅 2025-11-23

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work reveals that item-side unfairness in large language model–based recommender systems (LRSs) originates not only from supervised fine-tuning (SFT) but is fundamentally rooted in inherent biases acquired during pretraining—and subsequently amplified during SFT. To address this root cause, we propose a self-play fine-tuning framework featuring a dual-role adversarial paradigm: a *judger* that identifies bias and a *corrector* that mitigates it. The framework integrates self-play generation, bias detection and correction networks, sample reweighting, and dynamic feedback—enabling end-to-end, label-free fairness optimization. Extensive experiments demonstrate that our method significantly reduces item-side unfairness across multiple benchmarks (average reduction of 32.7%) while simultaneously improving recommendation accuracy (average +4.1% Recall@10). To the best of our knowledge, this is the first approach for LRSs to achieve *joint improvement* of fairness and utility.

Technology Category

Application Category

📝 Abstract

Large language model-based Recommender Systems (LRSs) have demonstrated superior recommendation performance by integrating pre-training with Supervised Fine-Tuning (SFT). However, this approach introduces item-side unfairness. Existing studies primarily attribute this issue to the absence of fairness constraints during SFT and attempt to mitigate unfairness via re-weighting and re-ranking methods. In this paper, we find that unfairness arises not only from SFT but also from pre-training, where inherent biases are further amplified during SFT. This finding underscores the failure of current methods to address the root causes of unfairness. Moreover, current methods struggle to preserve satisfactory recommendation performance. To tackle these issues, we propose an Unfair-to-Fair evOlving (UFO) framework using a self-play mechanism, formulating unfairness mitigation as a two-player game. UFO alternates between two player roles: the extit{judger}, which identifies unfairness from both pre-training and SFT, and the extit{corrector}, which adjusts the LRS to address identified unfairness while preserving recommendation performance. Iterative optimization between these roles enables UFO to completely resolve unfairness. Extensive experiments demonstrate that UFO effectively mitigates unfairness while improving recommendation performance.

Problem

Research questions and friction points this paper is trying to address.

Mitigates item-side unfairness in LLM-based recommender systems

Addresses inherent biases from both pre-training and fine-tuning stages

Preserves recommendation performance while resolving unfairness issues

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-play fine-tuning mitigates unfairness in LLMs

Two-player game identifies and corrects unfairness biases

Iterative optimization balances fairness and recommendation performance

🔎 Similar Papers

No similar papers found.

Authors to Follow