Learning in Position-Aware Multinomial Logit Bandits: From Multiplicative to General Position Effects

📅 2026-05-16

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This study addresses the dynamic joint product selection and ranking problem, where product attractiveness is jointly determined by intrinsic quality and display position. Under a multinomial logit choice model, the authors develop online learning algorithms based on single-round feedback for both multiplicative and general position effects. The key contribution lies in establishing matching upper and lower regret bounds for both models—resolving a longstanding √K gap in prior theoretical analyses—and enabling real-time decision-making. The proposed approach integrates cross-position pairwise maximum likelihood estimation, upper confidence bound strategies, the Dinkelbach algorithm, and maximum-weight bipartite matching. Experimental results demonstrate that the algorithms significantly outperform existing benchmarks on both synthetic data and the real-world Expedia dataset, achieving both theoretical optimality and practical efficacy.

📝 Abstract

We study the dynamic joint assortment selection and positioning problem, where the attraction of each product depends on both its intrinsic appeal and its display position under a Multinomial Logit (MNL) choice framework. Our study ranges from the multiplicative position effects model, in which each product's attraction is scaled by a position-specific factor, to a general position effects model assigning independent attraction parameters to every product--position pair to capture heterogeneous synergies. For both models, we design round-based learning algorithms that update decisions after every single feedback, and establish the first regret-optimal characterization. Besides, our round-based algorithms provide the prompt operations needed by modern platforms. For the multiplicative model, we develop a cross-position pairwise maximum likelihood estimator with a clipping mechanism, and prove that our algorithm P2MLE-UCB attains a regret of $\tilde{O}(\sqrt{NT})$, matching the lower bound and closing the $\sqrt{K}$ gap left by prior epoch-based analyses. For the general model, we establish a minimax lower bound and propose GP2-UCB with a matching upper bound. Moreover, we design an efficient subroutine for the per-round joint assortment and positioning optimization based on Dinkelbach's method and maximum-weight bipartite matching. Numerical experiments on synthetic data and the Expedia dataset show that our algorithms consistently outperform state-of-the-art benchmarks.

Problem

Research questions and friction points this paper is trying to address.

assortment selection

position effects

multinomial logit bandits

dynamic optimization

product positioning

Innovation

Methods, ideas, or system contributions that make the work stand out.

position-aware MNL bandits

regret-optimal learning

multiplicative position effects