🤖 AI Summary
To address weak generalization in user behavior modeling—caused by long temporal dependencies, complex contextual interactions, and severe long-tail distributions—this paper introduces the first foundation model for large-scale user behavior prediction. Methodologically, it proposes a novel distributionally robust optimization (DRO)-driven pretraining paradigm integrated with a Transformer architecture, enabling effective multi-task fine-tuning and cross-domain adaptation. It empirically establishes, for the first time in user behavior modeling, scalable scaling laws linking model performance to both data volume and parameter count. Evaluated on real-world industrial datasets, the model achieves over 10% improvements in macro-recall and weighted recall compared to state-of-the-art methods. This work establishes a scalable, robust, and general-purpose foundation model paradigm for user behavior modeling, advancing both theoretical understanding and practical deployment.
📝 Abstract
In recent years, foundational models have revolutionized the fields of language and vision, demonstrating remarkable abilities in understanding and generating complex data; however, similar advances in user behavior modeling have been limited, largely due to the complexity of behavioral data and the challenges involved in capturing intricate temporal and contextual relationships in user activities. To address this, we propose BehaveGPT, a foundational model designed specifically for large-scale user behavior prediction. Leveraging transformer-based architecture and a novel pretraining paradigm, BehaveGPT is trained on vast user behavior datasets, allowing it to learn complex behavior patterns and support a range of downstream tasks, including next behavior prediction, long-term generation, and cross-domain adaptation. Our approach introduces the DRO-based pretraining paradigm tailored for user behavior data, which improves model generalization and transferability by equitably modeling both head and tail behaviors. Extensive experiments on real-world datasets demonstrate that BehaveGPT outperforms state-of-the-art baselines, achieving more than a 10% improvement in macro and weighted recall, showcasing its ability to effectively capture and predict user behavior. Furthermore, we measure the scaling law in the user behavior domain for the first time on the Honor dataset, providing insights into how model performance scales with increased data and parameter sizes.