🤖 AI Summary
This paper studies the thresholded multi-armed bandit problem under local differential privacy (LDP): identifying arms whose expected reward exceeds a given threshold, subject to fixed budget and confidence constraints. We propose a privacy mechanism based on Bernoulli randomized response and develop a unified algorithmic framework that integrates concentration inequality analysis with information-theoretic lower bound derivation to jointly optimize privacy preservation and decision efficiency. We prove that the proposed algorithm achieves a sample complexity within at most a logarithmic factor of the fundamental information-theoretic lower bound for LDP threshold identification—establishing, for the first time, near-optimal trade-offs among estimation error, privacy loss, and sampling efficiency. Extensive experiments demonstrate its high efficiency and robustness in arm identification under strong LDP guarantees, revealing the intrinsic precision limits of sequential decision-making under privacy constraints.
📝 Abstract
This work investigates the impact of ensuring local differential privacy in the thresholding bandit problem. We consider both the fixed budget and fixed confidence settings. We propose methods that utilize private responses, obtained through a Bernoulli-based differentially private mechanism, to identify arms with expected rewards exceeding a predefined threshold. We show that this procedure provides strong privacy guarantees and derive theoretical performance bounds on the proposed algorithms. Additionally, we present general lower bounds that characterize the additional loss incurred by any differentially private mechanism, and show that the presented algorithms match these lower bounds up to poly-logarithmic factors. Our results provide valuable insights into privacy-preserving decision-making frameworks in bandit problems.