đ€ AI Summary
This paper investigates online learning in constrained games under both stochastic and adversarial environments. We propose a Follow-the-Perturbed-Leader (FTPL) framework based on unbounded FrĂ©chet-type perturbations. First, we establish the first *dual-world optimal* performance guarantee for FTPL under *asymmetric* FrĂ©chet perturbations. Second, we uncover an intrinsic connection between Tsallis entropy and *symmetric* FrĂ©chet perturbations, revealing fundamental distinctions between two- and multi-armed settings. Third, we design a Gumbel-FrĂ©chet *hybrid tail perturbation* mechanism and, leveraging FTRLâFTPL duality, achieve *tight dual-world optimality* for FTPL over a generalized perturbation family. Theoretically, we prove that symmetric perturbations are optimal in the two-armed case but suffer inherent limitations in the multi-armed setting. Extensive numerical experiments corroborate these theoretical findings.
đ Abstract
Follow-the-Regularized-Leader (FTRL) policies have achieved Best-of-Both-Worlds (BOBW) results in various settings through hybrid regularizers, whereas analogous results for Follow-the-Perturbed-Leader (FTPL) remain limited due to inherent analytical challenges. To advance the analytical foundations of FTPL, we revisit classical FTRL-FTPL duality for unbounded perturbations and establish BOBW results for FTPL under a broad family of asymmetric unbounded Fréchet-type perturbations, including hybrid perturbations combining Gumbel-type and Fréchet-type tails. These results not only extend the BOBW results of FTPL but also offer new insights into designing alternative FTPL policies competitive with hybrid regularization approaches. Motivated by earlier observations in two-armed bandits, we further investigate the connection between the $1/2$-Tsallis entropy and a Fréchet-type perturbation. Our numerical observations suggest that it corresponds to a symmetric Fréchet-type perturbation, and based on this, we establish the first BOBW guarantee for symmetric unbounded perturbations in the two-armed setting. In contrast, in general multi-armed bandits, we find an instance in which symmetric Fréchet-type perturbations violate the key condition for standard BOBW analysis, which is a problem not observed with asymmetric or nonnegative Fréchet-type perturbations. Although this example does not rule out alternative analyses achieving BOBW results, it suggests the limitations of directly applying the relationship observed in two-armed cases to the general case and thus emphasizes the need for further investigation to fully understand the behavior of FTPL in broader settings.