Does LLM Focus on the Right Words? Diagnosing Language Bias in LLM-based Recommenders

📅 2025-10-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses language bias in LLM-based recommender systems induced by supervised fine-tuning (SFT), wherein models over-rely on auxiliary tokens (e.g., task instructions) while neglecting user interaction tokens that encode genuine preferences. To mitigate this, we propose GDRT—a novel fine-tuning paradigm grounded in group-wise distributionally robust optimization (DRO). GDRT semantically partitions input tokens into user interaction groups and auxiliary text groups, then dynamically reweights their contributions to suppress interference from auxiliary tokens and strengthen user behavior modeling. Evaluated on three public benchmarks, GDRT achieves an average 24.29% improvement in NDCG@10, significantly enhancing both recommendation accuracy and fairness. To our knowledge, GDRT is the first framework to systematically integrate distributionally robust optimization for token-level bias mitigation in LLM recommenders.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs), owing to their extensive open-domain knowledge and semantic reasoning capabilities, have been increasingly integrated into recommender systems (RS). However, a substantial gap remains between the pre-training objectives of LLMs and the specific requirements of recommendation tasks. To address this gap, supervised fine-tuning (SFT) is commonly performed on specially curated recommendation datasets to further enhance their predictive ability. Despite its success, SFT exhibits a critical limitation: it induces Language Bias, whereby the model over-relies on auxiliary tokens-such as task descriptions and prefix-generated tokens-while underutilizing core user interaction tokens that encode user-specific preferences. This bias not only undermines recommendation accuracy but also raises unfairness concerns. To address this issue, we propose Group Distributionally Robust Optimization-based Tuning (GDRT), a novel fine-tuning paradigm that enforces consistent model performance across token groups with varying degrees of relevance to auxiliary tokens. By adaptively upweighting underperforming groups, typically those weakly correlated with auxiliary tokens, GDRT shifts the model's attention from superficial auxiliary cues to informative user interaction tokens, thereby mitigating language bias. Extensive experiments conducted on three public datasets demonstrate that GDRT effectively mitigates language bias, yielding substantial improvements in recommendation accuracy (with an average NDCG@10 gain of 24.29%) and significantly enhancing recommendation fairness.
Problem

Research questions and friction points this paper is trying to address.

LLMs over-rely on auxiliary tokens in recommendation systems
Fine-tuning causes underutilization of core user interaction tokens
Language bias reduces recommendation accuracy and fairness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Group Distributionally Robust Optimization-based Tuning paradigm
Enforces consistent performance across token groups
Adaptively upweights underperforming token groups
🔎 Similar Papers
No similar papers found.
Bohao Wang
Bohao Wang
College of Information Science & Electronic Engineering, Zhejiang University
Wireless AICommunication6GDigital TwinRay Tracing
J
Jiawei Chen
Zhejiang University
F
Feng Liu
OPPO Research Institute
C
Changwang Zhang
OPPO Research Institute
J
Jun Wang
OPPO Research Institute
Canghong Jin
Canghong Jin
Hangzhou City University
Data MiningBig data
C
Chun Chen
Zhejiang University
C
Can Wang
Zhejiang University