Auction-Based Online Policy Adaptation for Evolving Objectives

📅 2026-04-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of online policy adaptation in multi-objective reinforcement learning when objectives are dynamically added or removed. The authors propose a modular framework in which each objective is modeled by an independent local policy, treated as a competitive agent in a general-sum game. Action selection is coordinated through an auction-based mechanism: policies bid according to the urgency of their respective objectives in the current state, and the highest bidder executes its action. Leveraging structural commonalities among related objectives, this approach enables plug-and-play addition or removal of policies at runtime without requiring full retraining. A calibrated bidding mechanism ensures that trade-offs are interpretable and reflect objective priorities. Experiments on Atari Assault and a dynamic multi-objective grid-world pathfinding task demonstrate that the method significantly outperforms monolithic PPO baselines, validating its effectiveness and adaptability.
📝 Abstract
We consider multi-objective reinforcement learning problems where objectives come from an identical family -- such as the class of reachability objectives -- and may appear or disappear at runtime. Our goal is to design adaptive policies that can efficiently adjust their behaviors as the set of active objectives changes. To solve this problem, we propose a modular framework where each objective is supported by a selfish local policy, and coordination is achieved through a novel auction-based mechanism: policies bid for the right to execute their actions, with bids reflecting the urgency of the current state. The highest bidder selects the action, enabling a dynamic and interpretable trade-off among objectives. Going back to the original adaptation problem, when objectives change, the system adapts by simply adding or removing the corresponding policies. Moreover, as objectives arise from the same family, identical copies of a parameterized policy can be deployed, facilitating immediate adaptation at runtime. We show how the selfish local policies can be computed by turning the problem into a general-sum game, where the policies compete against each other to fulfill their own objectives. To succeed, each policy must not only optimize its own objective, but also reason about the presence of other goals and learn to produce calibrated bids that reflect relative priority. In our implementation, the policies are trained concurrently using proximal policy optimization (PPO). We evaluate on Atari Assault and a gridworld-based path-planning task with dynamic targets. Our method achieves substantially better performance than monolithic policies trained with PPO.
Problem

Research questions and friction points this paper is trying to address.

multi-objective reinforcement learning
online policy adaptation
dynamic objectives
auction-based coordination
modular policies
Innovation

Methods, ideas, or system contributions that make the work stand out.

auction-based coordination
modular reinforcement learning
dynamic multi-objective adaptation
selfish local policies
online policy adaptation
🔎 Similar Papers
No similar papers found.
G
Guruprerana Shabadi
University of Pennsylvania, United States
Kaushik Mallik
Kaushik Mallik
IMDEA Software Institute
Formal verificationReactive synthesisHybrid systems