Flow-based Policy Adaptation without Policy Updates

📅 2026-06-04

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

This work addresses the suboptimality, noise, and misalignment with expert behavior often present in non-expert actions—such as those from pretrained policies or human demonstrations—by introducing GLOVES, a method that leverages flow models to transport the non-expert action distribution toward the expert action distribution. GLOVES selectively corrects actions while preserving the original intent and enables on-demand intervention through backward-flow evaluation. Its key contributions include action-level adaptation without modifying the original policy, a flow-based in-distribution scoring mechanism that acts as an intervention gate to provide lightweight assistance only when necessary, and support for shared control across tasks and environments. Experiments demonstrate that with only a small number of expert demonstrations, GLOVES substantially improves task success rates while faithfully maintaining the agent’s original behavioral intent.

📝 Abstract

Leveraging prior knowledge from pretrained policies, foundation models, or human operators offers an efficient alternative to learning robot skills from scratch. However, these agents often provide actions that are suboptimal, noisy, or misaligned with task-specific expert behavior. We propose GLOVES, a family of flow-based adaptation methods that correct non-expert actions by transporting them toward an expert action distribution. Rather than replacing agentic control with full autonomy, GLOVES performs selective action-level adaptation, improving task success while preserving agent intent. The learned flow also provides a natural in-distribution scoring mechanism through reverse flow evaluation. We use this signal as an intervention gate: actions that appear consistent with the expert distribution are passed through unchanged, while anomalous or out-of-distribution (OOD) actions are corrected. In this way, assistance is only provided when necessary. GLOVES requires only limited expert supervision, using a small number of demonstrations or reusable successful skill segments. By learning local expert action patterns and stitching them during execution, GLOVES provides a lightweight shared-control module for robust action adaptation across tasks and environments. Code and demos are available at ripl.github.io/GLOVES_web.

Problem

Research questions and friction points this paper is trying to address.

policy adaptation

flow-based models

expert demonstration

out-of-distribution correction

shared control

Innovation

Methods, ideas, or system contributions that make the work stand out.

flow-based adaptation

selective action correction

out-of-distribution detection