🤖 AI Summary
In high-dimensional sparse settings—such as video ad recommendation—contextual multi-armed bandits suffer from performance degradation due to few-shot learning, measurement bias, and the curse of dimensionality. To address this, we propose a dynamic weighted Thompson sampling framework that innovatively integrates an adaptive weight adjustment mechanism into the Bayesian posterior update process, coupled with adaptive feature scaling and dynamic probability allocation tailored for high-dimensional sparse contexts. This design effectively mitigates cold-start bias while accelerating policy convergence and enhancing generalization robustness. In benchmark experiments featuring variable numbers of arms and effect sizes, our method achieves an 18.7% average improvement in cumulative reward over state-of-the-art contextual bandit approaches and accelerates convergence during the cold-start phase by 2.3×. The framework provides efficient, scalable, and real-time decision support for personalized interventions.
📝 Abstract
Aiming for more effective experiment design, such as in video content advertising where different content options compete for user engagement, these scenarios can be modeled as multi-arm bandit problems. In cases where limited interactions are available due to external factors, such as the cost of conducting experiments, recommenders often face constraints due to the small number of user interactions. In addition, there is a trade-off between selecting the best treatment and the ability to personalize and contextualize based on individual factors. A popular solution to this dilemma is the Contextual Bandit framework. It aims to maximize outcomes while incorporating personalization (contextual) factors, customizing treatments such as a user's profile to individual preferences. Despite their advantages, Contextual Bandit algorithms face challenges like measurement bias and the 'curse of dimensionality.' These issues complicate the management of numerous interventions and often lead to data sparsity through participant segmentation. To address these problems, we introduce the Weighted Allocation Probability Adjusted Thompson Sampling (WAPTS) algorithm. WAPTS builds on the contextual Thompson Sampling method by using a dynamic weighting parameter. This improves the allocation process for interventions and enables rapid optimization in data-sparse environments. We demonstrate the performance of our approach on different numbers of arms and effect sizes.