No-Regret M♮-Concave Function Maximization: Stochastic Bandit Algorithms and NP-Hardness of Adversarial Full-Information Setting

📅 2024-05-21

🏛️ Neural Information Processing Systems

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This paper studies online maximization of M♮-concave functions under bandit feedback, where the objective is unknown a priori and can only be queried with noisy observations. It considers two fundamental settings: stochastic multi-armed bandits and adversarial full-information. Methodologically, it integrates discrete convex analysis, stochastic optimization, complexity-theoretic reductions (to the 3-matroid intersection problem), and greedy policy analysis. Theoretically, it establishes that—unless P = NP—no polynomial-time algorithm can achieve sublinear regret in the adversarial full-information setting, and proves an Ω(T^{1−c}) regret lower bound. In the stochastic setting, it demonstrates the robustness of the greedy algorithm to local estimation errors, yielding an O(T^{−1/2}) simple regret bound and an O(T^{2/3}) standard regret bound. Collectively, these results characterize the fundamental statistical–computational trade-offs inherent in online discrete convex optimization.

Technology Category

Application Category

📝 Abstract

M${}^{ atural}$-concave functions, a.k.a. gross substitute valuation functions, play a fundamental role in many fields, including discrete mathematics and economics. In practice, perfect knowledge of M${}^{ atural}$-concave functions is often unavailable a priori, and we can optimize them only interactively based on some feedback. Motivated by such situations, we study online M${}^{ atural}$-concave function maximization problems, which are interactive versions of the problem studied by Murota and Shioura (1999). For the stochastic bandit setting, we present $O(T^{-1/2})$-simple regret and $O(T^{2/3})$-regret algorithms under $T$ times access to unbiased noisy value oracles of M${}^{ atural}$-concave functions. A key to proving these results is the robustness of the greedy algorithm to local errors in M${}^{ atural}$-concave function maximization, which is one of our main technical results. While we obtain those positive results for the stochastic setting, another main result of our work is an impossibility in the adversarial setting. We prove that, even with full-information feedback, no algorithms that run in polynomial time per round can achieve $O(T^{1-c})$ regret for any constant $c>0$ unless $mathsf{P} = mathsf{NP}$. Our proof is based on a reduction from the matroid intersection problem for three matroids, which would be a novel idea in the context of online learning.

Problem

Research questions and friction points this paper is trying to address.

Optimizing M♮-concave functions interactively with noisy feedback

Developing bandit algorithms for stochastic function maximization with regret bounds

Proving adversarial setting hardness for polynomial-time algorithms with full information

Innovation

Methods, ideas, or system contributions that make the work stand out.

Stochastic bandit algorithms for M-concave maximization

Greedy algorithm robustness to local errors

Hardness proof via matroid intersection reduction

🔎 Similar Papers

No similar papers found.

Authors to Follow