Neural Logistic Bandits

📅 2025-05-04

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This paper studies the neural logical bandit problem—learning an unknown reward function via neural networks under a logical link function. Existing methods suffer from regret bounds that scale poorly with either the inverse minimum variance κ or the ambient feature dimension d, rendering them ill-suited for high-dimensional neural representations. To address this, we derive a novel Bernstein-type vector-valued self-normalized martingale inequality, enabling the first regret bound that depends only on the effective dimension $ ilde{d}$ rather than $d$, while substantially weakening dependence on κ. Leveraging this inequality, we propose NeuralLog-UCB-1 and NeuralLog-UCB-2, achieving near-optimal regret upper bounds of $ ilde{O}( ilde{d}sqrt{kappa T})$ and $ ilde{O}( ilde{d}sqrt{T/kappa})$, respectively—improving upon prior work. Our theoretical advances are empirically validated on both synthetic and real-world datasets.

Technology Category

Application Category

📝 Abstract

We study the problem of neural logistic bandits, where the main task is to learn an unknown reward function within a logistic link function using a neural network. Existing approaches either exhibit unfavorable dependencies on $kappa$, where $1/kappa$ represents the minimum variance of reward distributions, or suffer from direct dependence on the feature dimension $d$, which can be huge in neural network-based settings. In this work, we introduce a novel Bernstein-type inequality for self-normalized vector-valued martingales that is designed to bypass a direct dependence on the ambient dimension. This lets us deduce a regret upper bound that grows with the effective dimension $widetilde{d}$, not the feature dimension, while keeping a minimal dependence on $kappa$. Based on the concentration inequality, we propose two algorithms, NeuralLog-UCB-1 and NeuralLog-UCB-2, that guarantee regret upper bounds of order $widetilde{O}(widetilde{d}sqrt{kappa T})$ and $widetilde{O}(widetilde{d}sqrt{T/kappa})$, respectively, improving on the existing results. Lastly, we report numerical results on both synthetic and real datasets to validate our theoretical findings.

Problem

Research questions and friction points this paper is trying to address.

Learning unknown reward function via neural logistic bandits

Reducing dependency on feature dimension in neural networks

Improving regret bounds with effective dimension and minimal variance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel Bernstein-type inequality for vector martingales

Regret bound depends on effective dimension

Proposed NeuralLog-UCB algorithms improve performance

🔎 Similar Papers

Neural Dueling Bandits

2024-07-24arXiv.orgCitations: 4

Authors to Follow