Improving Robotic Generalist Policies via Flow Reversal Steering

📅 2026-06-11

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

When general-purpose robotic policies struggle to directly respond to semantic instructions, efficiently extracting high-quality actions tailored to new tasks remains challenging. This work proposes Flow Reversal Steering (FRS), a novel approach that integrates flow-matching-based universal policies with reverse noise mapping. By inferring the latent noise underlying suboptimal yet plausible actions and projecting it onto the policy’s high-quality action modes, FRS enables effective translation from semantic instructions to executable controls. The method supports zero-shot control, behavior cloning distillation, and semantic-guided reinforcement learning. Experiments demonstrate that FRS substantially improves task success rates in both simulation and real-world settings, with behavior cloning distillation yielding up to a 95% absolute gain in success rate within one minute and effectively enhancing performance on tasks recalcitrant to standard reinforcement learning optimization.

📝 Abstract

Generalist policies can learn a wide range of skills from diverse robot datasets. In order to solve or improve on challenging news tasks, we need a way to infer and invoke the appropriate actions from the policy's rich behavioral prior, especially when directly commanding the policy fails. We focus on flow matching generalists and propose Flow Reversal Steering (FRS): a method that takes suboptimal but ``reasonable'' actions, finds their latent noises by passing them through the flow policy in reverse, and maps them to nearby generalist action modes. We evaluate FRS across many simulated and real-world manipulation settings. First, FRS can turn coarse semantic guidance from humans or vision-language models (VLMs) into corresponding good robot actions, improving zero-shot control. These gains can be distilled with behavioral cloning by training an auxiliary policy to output noises that the generalist maps to good actions -- showing up to 95% absolute task success rate boosts in under a minute of training. Finally, FRS enables policy improvement by bootstrapping reinforcement learning with semantic knowledge, improving on several tasks that standard RL fails to improve on.

Problem

Research questions and friction points this paper is trying to address.

generalist policies

robotic manipulation

zero-shot control

action inference

policy improvement

Innovation

Methods, ideas, or system contributions that make the work stand out.

Flow Reversal Steering

generalist policies

flow matching