Pseudo Label NCF for Sparse OHC Recommendation: Dual Representation Learning and the Separability Accuracy Trade off

📅 2026-03-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses data sparsity and cold-start challenges in online health communities by proposing a pseudo-label-driven dual embedding space learning framework. The approach leverages user questionnaire responses and support group features to generate pseudo-labels, which serve as auxiliary supervision signals. These pseudo-labels extend neural collaborative filtering models—including matrix factorization (MF), multilayer perceptrons (MLP), and Neural Matrix Factorization (NeuMF)—by constructing a dual representation architecture: a primary embedding for ranking and a pseudo-label embedding for semantic alignment. Evaluated on a dataset comprising 165 users and 498 groups, the method significantly improves HR@5 (e.g., from 4.58% to 5.42% for MF), achieves higher silhouette scores in pseudo-label embeddings than baselines, and reveals a trade-off between embedding separability and ranking performance, thereby balancing recommendation accuracy with interpretability.

Technology Category

Application Category

📝 Abstract
Online Health Communities connect patients for peer support, but users face a discovery challenge when they have minimal prior interactions to guide personalization. We study recommendation under extreme interaction sparsity in a survey driven setting where each user provides a 16 dimensional intake vector and each support group has a structured feature profile. We extend Neural Collaborative Filtering architectures, including Matrix Factorization, Multi Layer Perceptron, and NeuMF, with an auxiliary pseudo label objective derived from survey group feature alignment using cosine similarity mapped to [0, 1]. The resulting Pseudo Label NCF learns dual embedding spaces: main embeddings for ranking and pseudo label embeddings for semantic alignment. We evaluate on a dataset of 165 users and 498 support groups using a leave one out protocol that reflects cold start conditions. All pseudo label variants improve ranking performance: MLP improves HR@5 from 2.65% to 5.30%, NeuMF from 4.46% to 5.18%, and MF from 4.58% to 5.42%. Pseudo label embedding spaces also show higher cosine silhouette scores than baseline embeddings, with MF improving from 0.0394 to 0.0684 and NeuMF from 0.0263 to 0.0653. We further observe a negative correlation between embedding separability and ranking accuracy, indicating a trade off between interpretability and performance. These results show that survey derived pseudo labels improve recommendation under extreme sparsity while producing interpretable task specific embedding spaces.
Problem

Research questions and friction points this paper is trying to address.

sparse recommendation
online health communities
cold start
interaction sparsity
pseudo labeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pseudo Labeling
Neural Collaborative Filtering
Extreme Sparsity
Dual Representation Learning
Cold Start Recommendation
🔎 Similar Papers
No similar papers found.