Capacity-Constrained Online Convex Optimization with Delayed Feedback

📅 2026-06-10

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge in online convex optimization (OCO) where feedback delays combined with a tracking capacity constraint—limiting the system to monitor at most $C$ concurrent unfinished rounds—lead to permanent loss of partial feedback. To tackle this, the authors introduce a semi-adversarial delay model and reformulate the problem within a delayed and weighted OCO framework. They propose the Delayed-Weighted Follow-the-Regularized-Leader (FTRL) algorithm and its bandit variant, leveraging randomized scheduling and importance-weighted observations. This study establishes the first theoretical regret bounds for capacity-constrained delayed OCO under both full-information and bandit feedback settings, providing explicit guarantees for both convex and strongly convex losses. Notably, when $C = \Omega(\log T)$, the algorithm recovers the performance of standard delayed OCO. In the bandit setting, the regret degrades smoothly by a factor of $(1+\sigma_{\max}/C)$ while maintaining sublinear convergence.

📝 Abstract

Online learning with delayed feedback typically assumes that the learner can track all pending rounds until their feedback arrives. In practice, tracking resources are finite, and feedback from untracked rounds is permanently lost. In this paper, we study delayed online convex optimization (OCO) under a hard capacity constraint, where at most $C$ pending rounds can be tracked at any time. To model delay information, we introduce a semi-clairvoyant model that refines the clairvoyant assumption from prior work: rather than requiring delays to be known at prediction time, the learner observes delay expirations online, consistent with the classical unconstrained delayed setting. Our approach proceeds via a reduction to a novel ``delayed and weighted'' OCO problem, using a scheduler that randomizes tracking decisions and importance-weights the resulting observations. For this base problem, we propose and analyze Delayed-Weighted FTRL and its bandit analogue, establishing regret bounds that explicitly characterize the interaction between time-varying weights and delayed feedback. Combining these base learners with our schedulers yields the first regret guarantees for capacity-constrained OCO under convex and strongly convex losses, for both first-order and bandit feedback. For first-order feedback, capacity $C = Ω(\log T)$ suffices to recover standard delayed OCO rates up to logarithmic factors. For bandit feedback, the regret rates are modulated by powers of $(1 + σ_{\text{max}}/C)$, where $σ_{\text{max}}$ is the maximum number of pending observations at any time. This allows the regret bound to degrade gracefully when $C < σ_{\text{max}}$, while remaining sublinear.

Problem

Research questions and friction points this paper is trying to address.

online convex optimization

delayed feedback

capacity constraint

resource limitation

regret minimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

capacity-constrained online learning

delayed feedback

semi-clairvoyant model