🤖 AI Summary
This paper studies the neural logical bandit problem—learning an unknown reward function via neural networks under a logical link function. Existing methods suffer from regret bounds that scale poorly with either the inverse minimum variance κ or the ambient feature dimension d, rendering them ill-suited for high-dimensional neural representations. To address this, we derive a novel Bernstein-type vector-valued self-normalized martingale inequality, enabling the first regret bound that depends only on the effective dimension $ ilde{d}$ rather than $d$, while substantially weakening dependence on κ. Leveraging this inequality, we propose NeuralLog-UCB-1 and NeuralLog-UCB-2, achieving near-optimal regret upper bounds of $ ilde{O}( ilde{d}sqrt{kappa T})$ and $ ilde{O}( ilde{d}sqrt{T/kappa})$, respectively—improving upon prior work. Our theoretical advances are empirically validated on both synthetic and real-world datasets.
📝 Abstract
We study the problem of neural logistic bandits, where the main task is to learn an unknown reward function within a logistic link function using a neural network. Existing approaches either exhibit unfavorable dependencies on $kappa$, where $1/kappa$ represents the minimum variance of reward distributions, or suffer from direct dependence on the feature dimension $d$, which can be huge in neural network-based settings. In this work, we introduce a novel Bernstein-type inequality for self-normalized vector-valued martingales that is designed to bypass a direct dependence on the ambient dimension. This lets us deduce a regret upper bound that grows with the effective dimension $widetilde{d}$, not the feature dimension, while keeping a minimal dependence on $kappa$. Based on the concentration inequality, we propose two algorithms, NeuralLog-UCB-1 and NeuralLog-UCB-2, that guarantee regret upper bounds of order $widetilde{O}(widetilde{d}sqrt{kappa T})$ and $widetilde{O}(widetilde{d}sqrt{T/kappa})$, respectively, improving on the existing results. Lastly, we report numerical results on both synthetic and real datasets to validate our theoretical findings.