Semantics-Aware Denoising: A PLM-Guided Sample Reweighting Strategy for Robust Recommendation

📅 2026-02-16

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

Implicit feedback, such as clicks, often contains substantial noise that degrades recommendation performance. This work proposes a lightweight denoising approach that leverages a pretrained language model to construct textual user interest profiles and computes semantic similarity between these profiles and item descriptions. The resulting similarity scores are used to reweight training samples in the loss function, automatically downweighting semantically inconsistent clicks. Notably, the method requires only a modification to the loss function—without altering the backbone model, introducing auxiliary networks, or employing multi-stage training. Evaluated on two real-world datasets, the approach achieves a relative AUC improvement of up to 2.2% and demonstrates strong robustness under high-noise conditions, confirming its effectiveness and practicality.

Technology Category

Application Category

📝 Abstract

Implicit feedback, such as user clicks, serves as the primary data source for modern recommender systems. However, click interactions inherently contain substantial noise, including accidental clicks, clickbait-induced interactions, and exploratory browsing behaviors that do not reflect genuine user preferences. Training recommendation models with such noisy positive samples leads to degraded prediction accuracy and unreliable recommendations. In this paper, we propose SAID (Semantics-Aware Implicit Denoising), a simple yet effective framework that leverages semantic consistency between user interests and item content to identify and downweight potentially noisy interactions. Our approach constructs textual user interest profiles from historical behaviors and computes semantic similarity with target item descriptions using pre-trained language model (PLM) based text encoders. The similarity scores are then transformed into sample weights that modulate the training loss, effectively reducing the impact of semantically inconsistent clicks. Unlike existing denoising methods that require complex auxiliary networks or multi-stage training procedures, SAID only modifies the loss function while keeping the backbone recommendation model unchanged. Extensive experiments on two real-world datasets demonstrate that SAID consistently improves recommendation performance, achieving up to 2.2% relative improvement in AUC over strong baselines, with particularly notable robustness under high noise conditions.

Problem

Research questions and friction points this paper is trying to address.

implicit feedback

noisy interactions

recommendation robustness

click noise

user preference

Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantics-Aware Denoising

Pre-trained Language Model (PLM)

Implicit Feedback