🤖 AI Summary
To address the insufficient modeling of long-range collaborative signals in collaborative filtering and performance degradation under sparse or noisy data, this paper proposes a position-aware Graph Transformer recommendation model. Methodologically, it is the first to systematically integrate multi-type positional encodings—distance-, path-, and topology-based—into Graph Transformers, explicitly fusing node positions with graph structural information; further, an embedding layer enables linear complementarity between Transformer’s global modeling capacity and GCN’s local neighborhood aggregation, thereby overcoming GCN’s limited receptive field. Extensive experiments on four real-world datasets demonstrate that the model significantly outperforms baselines including PinSage and LightGCN, exhibiting superior robustness under interaction sparsity and label noise. The core contribution lies in pioneering a position-aware Graph Transformer architecture that unifies long-range collaborative signal modeling with local structural dependency learning.
📝 Abstract
Collaborative recommendation fundamentally involves learning high-quality user and item representations from interaction data. Recently, graph convolution networks (GCNs) have advanced the field by utilizing high-order connectivity patterns in interaction graphs, as evidenced by state-of-the-art methods like PinSage and LightGCN. However, one key limitation has not been well addressed in existing solutions: capturing long-range collaborative filtering signals, which are crucial for modeling user preference. In this work, we propose a new graph transformer (GT) framework -- extit{Position-aware Graph Transformer for Recommendation} (PGTR), which combines the global modeling capability of Transformer blocks with the local neighborhood feature extraction of GCNs. The key insight is to explicitly incorporate node position and structure information from the user-item interaction graph into GT architecture via several purpose-designed positional encodings. The long-range collaborative signals from the Transformer block are then combined linearly with the local neighborhood features from the GCN backbone to enhance node embeddings for final recommendations. Empirical studies demonstrate the effectiveness of the proposed PGTR method when implemented on various GCN-based backbones across four real-world datasets, and the robustness against interaction sparsity as well as noise.