Quantum Algorithms for Bandits with Knapsacks with Improved Regret and Time Complexities

📅 2025-07-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work introduces quantum computing to the Bandits with Knapsack (BwK) problem—a stochastic integer programming framework for resource-constrained online decision-making—marking the first such integration. We propose a novel algorithmic framework leveraging quantum oracles to access reward and resource consumption information, combined with an inexact quantum linear programming solver. Theoretically, under a problem-agnostic setting, our algorithm achieves a regret bound of $Oig(sqrt{T} + sqrt{T cdot B/mathrm{OPT}_{mathrm{LP}}}ig)$, yielding a $(1+sqrt{B/mathrm{OPT}_{mathrm{LP}}})$-factor speedup over the optimal classical regret bound. Under a problem-dependent setting, it attains quadratic improvement in regret and polynomial-time speedup scaling with problem dimension. This is the first work to establish rigorous quantum advantage guarantees for BwK, bridging quantum optimization and online learning under budget constraints.

Technology Category

Application Category

📝 Abstract
Bandits with knapsacks (BwK) constitute a fundamental model that combines aspects of stochastic integer programming with online learning. Classical algorithms for BwK with a time horizon $T$ achieve a problem-independent regret bound of ${O}(sqrt{T})$ and a problem-dependent bound of ${O}(log T)$. In this paper, we initiate the study of the BwK model in the setting of quantum computing, where both reward and resource consumption can be accessed via quantum oracles. We establish both problem-independent and problem-dependent regret bounds for quantum BwK algorithms. For the problem-independent case, we demonstrate that a quantum approach can improve the classical regret bound by a factor of $(1+sqrt{B/mathrm{OPT}_mathrm{LP}})$, where $B$ is budget constraint in BwK and $mathrm{OPT}_{mathrm{LP}}$ denotes the optimal value of a linear programming relaxation of the BwK problem. For the problem-dependent setting, we develop a quantum algorithm using an inexact quantum linear programming solver. This algorithm achieves a quadratic improvement in terms of the problem-dependent parameters, as well as a polynomial speedup of time complexity on problem's dimensions compared to classical counterparts. Compared to previous works on quantum algorithms for multi-armed bandits, our study is the first to consider bandit models with resource constraints and hence shed light on operations research.
Problem

Research questions and friction points this paper is trying to address.

Quantum algorithms for Bandits with Knapsacks (BwK) model
Improving regret bounds in quantum BwK algorithms
Achieving polynomial speedup in time complexity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantum oracles access reward and resource data
Inexact quantum linear programming solver used
Quadratic improvement in problem-dependent parameters
🔎 Similar Papers
No similar papers found.
Y
Yuexin Su
Center on Frontiers of Computing Studies, Peking University; School of Computer Science, Peking University
Z
Ziyi Yang
Center on Frontiers of Computing Studies, Peking University; School of Computer Science, Peking University; School of Mathematical Science, Peking University
P
Peiyuan Huang
Guanghua School of Management, Peking University
Tongyang Li
Tongyang Li
Center on Frontiers of Computing Studies, Peking University
Quantum ComputingTheoretical Computer ScienceOptimizationMachine Learning
Yinyu Ye
Yinyu Ye
Professor of Emeritus, Stanford University and Visiting Professor of SJTU, CUHKSZ and HKUST
Optimization - Operations Research - Mathematical Programming - Computational Science