Optimal Exploration of New Products under Assortment Decisions

📅 2026-04-20

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This study addresses the challenge faced by online platforms in selecting products under capacity constraints, where a trade-off exists between exploring new items with unknown quality and maximizing short-term revenue. The authors propose a product selection model grounded in online learning, combinatorial bandit theory, and dynamic programming, incorporating social learning feedback. They show that the optimal policy exhibits a simple threshold structure: new items should always be displayed alongside top-performing products, and the number of items explored in parallel depends solely on the aggregate potential of new items, independent of their individual purchase probabilities. This strategy substantially reduces cumulative regret and reveals inherent limitations of standard approaches—UCB tends to over-explore, while Thompson Sampling often under-explores—thereby offering both theoretical insights and practical guidance for new product recommendation on digital platforms.

Technology Category

Application Category

📝 Abstract

We study online learning for new products on a platform that makes capacity-constrained assortment decisions on which products to offer. For a newly listed product, its quality is initially unknown, and quality information propagates through social learning: when a customer purchases a new product and leaves a review, its quality is revealed to both the platform and future customers. Since reviews require purchases, the platform must feature new products in the assortment ("explore") to generate reviews to learn about new products. Such exploration is costly because customer demand for new products is lower than for incumbent products. We characterize the optimal assortments for exploration to minimize regret, addressing two questions. (1) Should the platform offer a new product alone or alongside incumbent products? The former maximizes the purchase probability of the new product but yields lower short-term revenue. Despite the lower purchase probability, we show it is always optimal to pair the new product with the top incumbent products. (2) With multiple new products, should the platform explore them simultaneously or one at a time? We show that the optimal number of new products to explore simultaneously has a simple threshold structure: it increases with the "potential" of the new products and, surprisingly, does not depend on their individual purchase probabilities. We also show that two canonical bandit algorithms, UCB and Thompson Sampling, both fail in this setting for opposite reasons: UCB over-explores while Thompson Sampling under-explores. Our results provide structural insights on how platforms should learn about new products through assortment decisions.

Problem

Research questions and friction points this paper is trying to address.

assortment optimization

online learning

new product exploration

social learning

multi-armed bandits

Innovation

Methods, ideas, or system contributions that make the work stand out.

assortment optimization

online learning

social learning