π€ AI Summary
In industrial recommendation systems, decoupled training of graph neural networks (GNNs) and recommendation models incurs high computational overhead and prevents gradient backpropagation for optimizing GNN representation capability. To address these issues, this paper proposes E2E-GRec, an end-to-end jointly trained framework integrating GNNs with recommendation models. Its key contributions are: (1) a lightweight subgraph-sampling-based graph feature autoencoder to reduce inference cost; (2) a two-level heterogeneous feature fusion mechanism unifying user-item interactions and attribute graph structural information; and (3) GradNorm-based dynamic multi-task loss balancing coupled with self-supervised graph feature learning to enhance generalization. Online A/B testing demonstrates significant improvements: relative increase of 0.133% in user session duration and reduction of 0.3171% in video skip rate, with multiple core metrics outperforming conventional two-stage paradigms.
π Abstract
Graph Neural Networks (GNNs) have emerged as powerful tools for modeling graph-structured data and have been widely used in recommender systems, such as for capturing complex user-item and item-item relations. However, most industrial deployments adopt a two-stage pipeline: GNNs are first pre-trained offline to generate node embeddings, which are then used as static features for downstream recommender systems. This decoupled paradigm leads to two key limitations: (1) high computational overhead, since large-scale GNN inference must be repeatedly executed to refresh embeddings; and (2) lack of joint optimization, as the gradient from the recommender system cannot directly influence the GNN learning process, causing the GNN to be suboptimally informative for the recommendation task. In this paper, we propose E2E-GRec, a novel end-to-end training framework that unifies GNN training with the recommender system. Our framework is characterized by three key components: (i) efficient subgraph sampling from a large-scale cross-domain heterogeneous graph to ensure training scalability and efficiency; (ii) a Graph Feature Auto-Encoder (GFAE) serving as an auxiliary self-supervised task to guide the GNN to learn structurally meaningful embeddings; and (iii) a two-level feature fusion mechanism combined with Gradnorm-based dynamic loss balancing, which stabilizes graph-aware multi-task end-to-end training. Extensive offline evaluations, online A/B tests (e.g., a +0.133% relative improvement in stay duration, a 0.3171% reduction in the average number of videos a user skips) on large-scale production data, together with theoretical analysis, demonstrate that E2E-GRec consistently surpasses traditional approaches, yielding significant gains across multiple recommendation metrics.