Dual Advantage Fields

📅 2026-06-02

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

Offline goal-conditioned reinforcement learning often lacks effective local action preference signals, making precise action selection challenging even when long-horizon reachability is assured. This work proposes the Dual Advantage Field (DAF) method, which constructs local advantage signals via a bilinear dual-value model and introduces an action-effect model to predict the discounted feature displacement induced by actions, scoring them based on alignment with the target direction. Innovatively interpreting the goal embedding as the gradient of a value field with respect to state representations, DAF ensures that under realizability conditions, its action scoring is equivalent to the goal-conditioned Bellman advantage, thereby providing a principled guarantee for local policy improvement. Empirically, DAF significantly improves RLiable aggregate metrics across locomotion, manipulation, and puzzle-solving tasks in OGBench, particularly excelling in scenarios requiring deviation from direct paths to avoid local optima.

📝 Abstract

Offline goal-conditioned reinforcement learning requires both long-horizon reachability estimates and local action comparisons. Dual goal representations provide value fields that capture global goal reachability, but they do not directly specify which action should be preferred at a given state. We propose Dual Advantage Fields, a policy-extraction method that turns a bilinear dual value model into a local advantage signal. Under bilinear dual parameterization, the goal embedding is the gradient of the value field with respect to the state representation. DAF learns an action-effect model that predicts the discounted feature displacement induced by an action and scores actions by the alignment between this displacement and the goal direction. In the realizable case, this score equals the goal-conditioned Bellman advantage, yielding a standard local policy-improvement guarantee. On OGBench locomotion, manipulation, and puzzle tasks, DAF improves aggregate RLiable metrics and performs strongly in settings where locally correct actions differ from direct movement toward the final goal.

Problem

Research questions and friction points this paper is trying to address.

goal-conditioned reinforcement learning

offline reinforcement learning

action selection

value estimation

policy improvement

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual Advantage Fields

goal-conditioned reinforcement learning

bilinear dual value model

action-effect model

Bellman advantage

🔎 Similar Papers

Beyond Euclid: An Illustrated Guide to Modern Machine Learning with Geometric, Topological, and Algebraic Structures

2024-07-12arXiv.orgCitations: 8

What to align in multimodal contrastive learning?

2024-09-11arXiv.orgCitations: 1