Asymptotically Optimal Problem-Dependent Bandit Policies for Transfer Learning

📅 2025-09-23

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This paper studies the non-contextual multi-armed bandit problem under transfer learning: before the target task begins, the learner observes i.i.d. samples from each source distribution and knows that the distance between the $k$-th source and target distributions satisfies $d_k( u_k, u'_k) leq L_k$. For this setting, we establish, for the first time, a problem-dependent, migration-parameterized asymptotic regret lower bound. Building upon this, we propose KL-UCB-Transfer—a novel algorithm integrating KL-divergence-based upper confidence bounds with transfer priors—to achieve asymptotically optimal cumulative regret in Gaussian environments. The algorithm adaptively estimates distributional shifts using source samples and adjusts confidence intervals accordingly. Experiments demonstrate that when source and target distributions are close, KL-UCB-Transfer significantly outperforms non-transfer baselines and tightly matches the theoretical lower bound.

Technology Category

Application Category

📝 Abstract

We study the non-contextual multi-armed bandit problem in a transfer learning setting: before any pulls, the learner is given N'_k i.i.d. samples from each source distribution nu'_k, and the true target distributions nu_k lie within a known distance bound d_k(nu_k, nu'_k) <= L_k. In this framework, we first derive a problem-dependent asymptotic lower bound on cumulative regret that extends the classical Lai-Robbins result to incorporate the transfer parameters (d_k, L_k, N'_k). We then propose KL-UCB-Transfer, a simple index policy that matches this new bound in the Gaussian case. Finally, we validate our approach via simulations, showing that KL-UCB-Transfer significantly outperforms the no-prior baseline when source and target distributions are sufficiently close.

Problem

Research questions and friction points this paper is trying to address.

Extends multi-armed bandit regret bounds to transfer learning settings

Develops policies using source distribution samples with known distance bounds

Proposes KL-UCB-Transfer algorithm matching derived asymptotic lower bound

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends Lai-Robbins bound with transfer parameters

Proposes KL-UCB-Transfer index policy

Policy matches new bound for Gaussian distributions

🔎 Similar Papers

No similar papers found.

Authors to Follow