Auction-Based Online Policy Adaptation for Evolving Objectives

📅 2026-04-02

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

This work addresses the challenge of online policy adaptation in multi-objective reinforcement learning when objectives are dynamically added or removed. The authors propose a modular framework in which each objective is modeled by an independent local policy, treated as a competitive agent in a general-sum game. Action selection is coordinated through an auction-based mechanism: policies bid according to the urgency of their respective objectives in the current state, and the highest bidder executes its action. Leveraging structural commonalities among related objectives, this approach enables plug-and-play addition or removal of policies at runtime without requiring full retraining. A calibrated bidding mechanism ensures that trade-offs are interpretable and reflect objective priorities. Experiments on Atari Assault and a dynamic multi-objective grid-world pathfinding task demonstrate that the method significantly outperforms monolithic PPO baselines, validating its effectiveness and adaptability.

Technology Category

Application Category

📝 Abstract

We consider multi-objective reinforcement learning problems where objectives come from an identical family -- such as the class of reachability objectives -- and may appear or disappear at runtime. Our goal is to design adaptive policies that can efficiently adjust their behaviors as the set of active objectives changes. To solve this problem, we propose a modular framework where each objective is supported by a selfish local policy, and coordination is achieved through a novel auction-based mechanism: policies bid for the right to execute their actions, with bids reflecting the urgency of the current state. The highest bidder selects the action, enabling a dynamic and interpretable trade-off among objectives. Going back to the original adaptation problem, when objectives change, the system adapts by simply adding or removing the corresponding policies. Moreover, as objectives arise from the same family, identical copies of a parameterized policy can be deployed, facilitating immediate adaptation at runtime. We show how the selfish local policies can be computed by turning the problem into a general-sum game, where the policies compete against each other to fulfill their own objectives. To succeed, each policy must not only optimize its own objective, but also reason about the presence of other goals and learn to produce calibrated bids that reflect relative priority. In our implementation, the policies are trained concurrently using proximal policy optimization (PPO). We evaluate on Atari Assault and a gridworld-based path-planning task with dynamic targets. Our method achieves substantially better performance than monolithic policies trained with PPO.

Problem

Research questions and friction points this paper is trying to address.

multi-objective reinforcement learning

online policy adaptation

dynamic objectives

auction-based coordination

modular policies

Innovation

Methods, ideas, or system contributions that make the work stand out.

auction-based coordination

modular reinforcement learning

dynamic multi-objective adaptation