FraPPE: Fast and Efficient Preference-based Pure Exploration

📅 2025-08-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies the pure-exploration problem for vector-valued multi-armed bandits under preference-cone constraints, aiming to identify the Pareto-optimal arm set with minimal sample complexity. To address the computational inefficiency and suboptimal theoretical performance of existing algorithms, we propose the first efficient framework for solving minimax optimization problems under arbitrary preference cones: it exploits the geometric structure of the cone to reduce the dimensionality of the minimization subproblem and accelerates the maximization step via the Frank–Wolfe algorithm, achieving an overall complexity of $O(KL^2)$. We prove that the algorithm is asymptotically optimal—achieving the information-theoretic lower bound on sample complexity. Empirical evaluations on both synthetic and real-world datasets demonstrate that our method precisely recovers the Pareto frontier with the lowest sampling cost, significantly improving both computational efficiency and statistical optimality.

Technology Category

Application Category

📝 Abstract
Preference-based Pure Exploration (PrePEx) aims to identify with a given confidence level the set of Pareto optimal arms in a vector-valued (aka multi-objective) bandit, where the reward vectors are ordered via a (given) preference cone $mathcal{C}$. Though PrePEx and its variants are well-studied, there does not exist a computationally efficient algorithm that can optimally track the existing lower bound for arbitrary preference cones. We successfully fill this gap by efficiently solving the minimisation and maximisation problems in the lower bound. First, we derive three structural properties of the lower bound that yield a computationally tractable reduction of the minimisation problem. Then, we deploy a Frank-Wolfe optimiser to accelerate the maximisation problem in the lower bound. Together, these techniques solve the maxmin optimisation problem in $mathcal{O}(KL^{2})$ time for a bandit instance with $K$ arms and $L$ dimensional reward, which is a significant acceleration over the literature. We further prove that our proposed PrePEx algorithm, FraPPE, asymptotically achieves the optimal sample complexity. Finally, we perform numerical experiments across synthetic and real datasets demonstrating that FraPPE achieves the lowest sample complexities to identify the exact Pareto set among the existing algorithms.
Problem

Research questions and friction points this paper is trying to address.

Efficiently solving minimisation and maximisation in lower bound
Accelerating maxmin optimisation for preference-based pure exploration
Achieving optimal sample complexity for Pareto set identification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Efficient minimisation via structural properties
Frank-Wolfe optimiser for maximisation acceleration
O(KL²) time complexity for maxmin optimisation
🔎 Similar Papers
No similar papers found.
U
Udvas Das
Univ. Lille, Inria, CNRS, Centrale Lille, UMR 9189 - CRIStAL, Lille, France
A
Apurv Shukla
Department of EECS, University of Michigan, Ann Arbor, MI, USA
Debabrota Basu
Debabrota Basu
Faculty, Inria at University of Lille and CNRS (CRIStAL), ELLIS Scholar
Reinforcement LearningMulti-armed BanditsDifferential PrivacyFairnessOptimization