Efficient Multinomial Logistic Bandit via Frequent Directions

📅 2026-06-10
📈 Citations: 0
✨ Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of high computational complexity and poor scalability in existing algorithms for high-dimensional multiclass logistic bandits. To overcome this, the authors introduce Frequent Directions matrix sketching into this framework for the first time, integrating it with online Newton steps and spectral norm approximation to reduce high-dimensional parameter updates and upper confidence bound computations to low-dimensional tasks. The proposed method achieves a per-round time complexity of $O(Kd(m+K)^2)$ and space complexity of $O(Kd(m+K))$, while preserving a near-optimal regret bound. Empirical results demonstrate that the algorithm is competitive in both computational efficiency and performance.
📝 Abstract
This paper studies efficient online algorithms for multinomial logistic bandits (MLogB), where the feedback distribution over $K+1$ outcomes follows a multinomial logistic model of $d$-dimensional action vectors. A representative UCB-type algorithm, OFUL-MLogB, achieves a regret bound of $\tilde{\mathcal{O}}(Kd\sqrt{T})$, but still requires $\mathcal{O}(K^3d^3)$ time and $\mathcal{O}(K^2d^2)$ space per round due to parameter estimation and optimistic reward construction, which is prohibitive in high-dimensional settings. To address this limitation, we propose EOFD-MLogB, which integrates frequent directions matrix sketching into OFUL-MLogB. By maintaining a low-rank SVD sketch of the accumulated Hessian, constrained online Newton updates in parameter estimation and $Kd \times K$ spectral-norm computations in the reward bonus are reduced to one-dimensional root-finding tasks and $K \times K$ eigenvalue computations, respectively. This yields dominant per-round time complexity $\mathcal{O}(Kd(m+K)^2)$ and space complexity $\mathcal{O}(Kd(m+K))$, where $m \ll d$ is the sketch size. We further prove a regret bound of $\tilde{\mathcal{O}}(Δ_T(Kd\lnΔ_T+m)\sqrt{T})$, where the sketching error factor $Δ_T$ is controlled by the $m$-truncated spectral tail of the Hessian. Thus, when the Hessian is approximately low-rank, the regret is close to that of OFUL-MLogB. Experiments validate the computational efficiency and competitive performance.
Problem

Research questions and friction points this paper is trying to address.

multinomial logistic bandit
high-dimensional
computational efficiency
online learning
regret minimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multinomial Logistic Bandit
Frequent Directions
Matrix Sketching
Online Newton Method
Regret Analysis
🔎 Similar Papers