BehaveGPT: A Foundation Model for Large-scale User Behavior Modeling

📅 2025-05-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address weak generalization in user behavior modeling—caused by long temporal dependencies, complex contextual interactions, and severe long-tail distributions—this paper introduces the first foundation model for large-scale user behavior prediction. Methodologically, it proposes a novel distributionally robust optimization (DRO)-driven pretraining paradigm integrated with a Transformer architecture, enabling effective multi-task fine-tuning and cross-domain adaptation. It empirically establishes, for the first time in user behavior modeling, scalable scaling laws linking model performance to both data volume and parameter count. Evaluated on real-world industrial datasets, the model achieves over 10% improvements in macro-recall and weighted recall compared to state-of-the-art methods. This work establishes a scalable, robust, and general-purpose foundation model paradigm for user behavior modeling, advancing both theoretical understanding and practical deployment.

Technology Category

Application Category

📝 Abstract
In recent years, foundational models have revolutionized the fields of language and vision, demonstrating remarkable abilities in understanding and generating complex data; however, similar advances in user behavior modeling have been limited, largely due to the complexity of behavioral data and the challenges involved in capturing intricate temporal and contextual relationships in user activities. To address this, we propose BehaveGPT, a foundational model designed specifically for large-scale user behavior prediction. Leveraging transformer-based architecture and a novel pretraining paradigm, BehaveGPT is trained on vast user behavior datasets, allowing it to learn complex behavior patterns and support a range of downstream tasks, including next behavior prediction, long-term generation, and cross-domain adaptation. Our approach introduces the DRO-based pretraining paradigm tailored for user behavior data, which improves model generalization and transferability by equitably modeling both head and tail behaviors. Extensive experiments on real-world datasets demonstrate that BehaveGPT outperforms state-of-the-art baselines, achieving more than a 10% improvement in macro and weighted recall, showcasing its ability to effectively capture and predict user behavior. Furthermore, we measure the scaling law in the user behavior domain for the first time on the Honor dataset, providing insights into how model performance scales with increased data and parameter sizes.
Problem

Research questions and friction points this paper is trying to address.

Lack of foundational models for user behavior modeling
Challenges in capturing temporal and contextual behavior relationships
Need for improved generalization in behavior prediction tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based architecture for behavior modeling
DRO-based pretraining for equitable behavior learning
Scaling law measurement in user behavior domain
🔎 Similar Papers
No similar papers found.
Jiahui Gong
Jiahui Gong
Tsinghua University
Machine LearningSpatial Temporal PredictionRecommender System
Jingtao Ding
Jingtao Ding
Tsinghua University
Spatio-temporal Data MiningComplex NetworksSynthetic DataRecommender Systems
F
Fanjin Meng
Department of Electronic Engineering, BNRist, Tsinghua University, Beijing, China
C
Chen Yang
Honor Device Co., Ltd, Beijing, China
H
Hong Chen
Honor Device Co., Ltd, Beijing, China
Z
Zuojian Wang
Honor Device Co., Ltd, Beijing, China
H
Haisheng Lu
Honor Device Co., Ltd, Beijing, China
Y
Yong Li
Department of Electronic Engineering, BNRist, Tsinghua University, Beijing, China