Online Decision-Focused Learning

📅 2025-05-19

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work studies decision-focused learning (DFL) in dynamic environments, where the objective function is nonsmooth (with zero or undefined gradients) and nonconvex, while data distributions and time-varying constraints evolve continuously—rendering conventional DFL approaches ineffective. We first extend DFL to an online learning framework and propose a synergistic mechanism combining differentiable regularization with optimistic prediction. Leveraging a near-optimal oracle and dynamic regret analysis, we establish the first theoretical guarantee of bounded expected dynamic regret for time-varying constraints—valid over simplex and convex polyhedral decision spaces. Experiments on time-varying knapsack problems demonstrate that our method significantly outperforms existing prediction-focused baselines, achieving both rigorous theoretical foundations and strong empirical performance.

Technology Category

Application Category

📝 Abstract

Decision-focused learning (DFL) is an increasingly popular paradigm for training predictive models whose outputs are used in decision-making tasks. Instead of merely optimizing for predictive accuracy, DFL trains models to directly minimize the loss associated with downstream decisions. This end-to-end strategy holds promise for tackling complex combinatorial problems; however, existing studies focus solely on scenarios where a fixed batch of data is available and the objective function does not change over time. We instead investigate DFL in dynamic environments where the objective function and data distribution evolve over time. This setting is challenging because the objective function has zero or undefined gradients -- which prevents the use of standard first-order optimization methods -- and is generally non-convex. To address these difficulties, we (i) regularize the objective to make it differentiable and (ii) make use of the optimism principle, based on a near-optimal oracle along with an appropriate perturbation. This leads to a practical online algorithm for which we establish bounds on the expected dynamic regret, both when the decision space is a simplex and when it is a general bounded convex polytope. Finally, we demonstrate the effectiveness of our algorithm by comparing its performance with a classic prediction-focused approach on a simple knapsack experiment.

Problem

Research questions and friction points this paper is trying to address.

Addressing dynamic environments in Decision-Focused Learning (DFL)

Handling non-differentiable and non-convex objective functions in DFL

Developing an online algorithm for DFL with dynamic regret bounds

Innovation

Methods, ideas, or system contributions that make the work stand out.

Regularizes objective for differentiability in DFL

Uses optimism principle with near-optimal oracle

Develops online algorithm for dynamic regret bounds

🔎 Similar Papers

Online Loss Function Learning

2023-01-30arXiv.orgCitations: 5

Authors to Follow