๐ค AI Summary
In e-commerce federated recommendation, injecting noise into global data (e.g., user/item catalogs) undermines model personalization. To address this, we propose a semantics-aligned selective enhancement method for global data. Our approach features: (1) an RL-driven probabilistic estimator that dynamically selects high-quality global samples semantically consistent with local interaction patterns; (2) integration of a pre-trained graph encoder to extract structural features; and (3) a local validity predictor that guides federated collaborative training. The method preserves data privacy while effectively suppressing interference from irrelevant global patterns, thereby enhancing local model personalization. Evaluated on mainstream federated recommendation benchmarks, it achieves up to a 34.86% improvement in Recall@10โoutperforming state-of-the-art methods. This work is the first to systematically demonstrate the efficacy and scalability of principled global-data valuation and controlled utilization in federated recommendation.
๐ Abstract
Federated Learning (FL) is gaining prominence in machine learning as privacy concerns grow. This paradigm allows each client (e.g., an individual online store) to train a recommendation model locally while sharing only model updates, without exposing the raw interaction logs to a central server, thereby preserving privacy in a decentralized environment. Nonetheless, most existing FL-based recommender systems still rely solely on each client's private data, despite the abundance of publicly available datasets that could be leveraged to enrich local training; this potential remains largely underexplored. To this end, we consider a realistic scenario wherein a large shopping platform collaborates with multiple small online stores to build a global recommender system. The platform possesses global data, such as shareable user and item lists, while each store holds a portion of interaction data privately (or locally). Although integrating global data can help mitigate the limitations of sparse and biased clients' local data, it also introduces additional challenges: simply combining all global interactions can amplify noise and irrelevant patterns, worsening personalization and increasing computational costs. To address these challenges, we propose FedGDVE, which selectively augments each client's local graph with semantically aligned samples from the global dataset. FedGDVE employs: (i) a pre-trained graph encoder to extract global structural features, (ii) a local valid predictor to assess client-specific relevance, (iii) a reinforcement-learning-based probability estimator to filter and sample only the most pertinent global interactions. FedGDVE improves performance by up to 34.86% on recognized benchmarks in FL environments.