MTGRBoost: Boosting Large-scale Generative Recommendation Models in Meituan

📅 2025-05-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Industrial training systems for generative recommendation models (GRMs) suffer from inefficient sparse embedding updates, GPU load imbalance, and suboptimal embedding lookup performance. To address these challenges, this paper introduces the first efficient and scalable system tailored for large-scale GRM training. Our method features: (1) a dynamic hash table replacing static embedding tables to enable real-time embedding insertion/deletion and low-latency lookup; (2) a dynamic sequence balancing strategy coupled with embedding ID deduplication and automatic table merging to mitigate long-tail distribution effects and redundant parameter updates; and (3) integration of mixed-precision training, gradient accumulation, operator fusion, and fault-tolerant checkpointing. Experiments demonstrate 1.6×–2.4× higher training throughput and near-linear scalability up to 100 GPUs. The system has been deployed in production at Meituan, serving over 100 million inference requests daily.

Technology Category

Application Category

📝 Abstract
Recommendation is crucial for both user experience and company revenue, and generative recommendation models (GRMs) are shown to produce quality recommendations recently. However, existing systems are limited by insufficient functionality support and inefficient implementations for training GRMs in industrial scenarios. As such, we introduce MTGRBoost as an efficient and scalable system for GRM training. Specifically, to handle the real-time insert/delete of sparse embedding entries, MTGRBoost employs dynamic hash tables to replace static tables. To improve efficiency, MTGRBoost conducts dynamic sequence balancing to address the computation load imbalances among GPUs and adopts embedding ID deduplication alongside automatic table merging to accelerate embedding lookup. MTGRBoost also incorporates implementation optimizations including checkpoint resuming, mixed precision training, gradient accumulation, and operator fusion. Extensive experiments show that MTGRBoost improves training throughput by $1.6 imes$ -- $2.4 imes$ while achieving good scalability when running over 100 GPUs. MTGRBoost has been deployed for many applications in Meituan and is now handling hundreds of millions of requests on a daily basis.
Problem

Research questions and friction points this paper is trying to address.

Enhance functionality and efficiency for generative recommendation models
Address real-time sparse embedding entry updates dynamically
Optimize GPU computation balance and embedding lookup speed
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic hash tables for real-time sparse embedding updates
Dynamic sequence balancing to equalize GPU workloads
Embedding ID deduplication with automatic table merging
🔎 Similar Papers
No similar papers found.
Y
Yuxiang Wang
Wuhan University
X
Xiao Yan
Meituan
C
Chi Ma
Meituan
M
Mincong Huang
Meituan
Xiaoguang Li
Xiaoguang Li
Noah's Ark Lab,HUAWEI
Question AnsweringInformation RetrievalDialogue Systems
L
Lei Yu
Meituan
Chuan Liu
Chuan Liu
University of Rochester
Ruidong Han
Ruidong Han
Meituan
recommender systemgenerative model
H
He Jiang
Meituan
Bin Yin
Bin Yin
Meituan
S
Shangyu Chen
Meituan
F
Fei Jiang
Meituan
X
Xiang Li
Meituan
W
Wei Lin
Meituan
H
Haowei Han
Wuhan University
Bo Du
Bo Du
Department of Management, Griffith Business School
Sustainable TransportTravel BehaviourUrban Data AnalyticsLogistics and Supply Chain
J
Jiawei Jiang
Wuhan University