ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning

📅 2025-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address challenges in continuous robotic control—including difficulty in online fine-tuning, high computational overhead, and instability of single-step denoising in flow-matching strategies (e.g., Rectified Flow, Shortcut Models)—this paper proposes ReinFlow: the first framework modeling flow policies as discrete-time Markov processes to enable exact likelihood computation; it introduces stochastic path injection and a likelihood-driven online reinforcement learning mechanism, enabling stable training and deployment even at minimal step counts (including one-step inference). On legged locomotion tasks, ReinFlow improves Rectified Flow’s reward by 135.36% while reducing inference latency by 82.63%; on manipulation tasks, it boosts Shortcut Model success rate by 40.34% and cuts computational cost by 23.20%, matching DDIM fine-tuning performance. This work establishes both theoretical foundations and an efficient practical paradigm for deep integration of flow matching and reinforcement learning.

Technology Category

Application Category

📝 Abstract
We propose ReinFlow, a simple yet effective online reinforcement learning (RL) framework that fine-tunes a family of flow matching policies for continuous robotic control. Derived from rigorous RL theory, ReinFlow injects learnable noise into a flow policy's deterministic path, converting the flow into a discrete-time Markov Process for exact and straightforward likelihood computation. This conversion facilitates exploration and ensures training stability, enabling ReinFlow to fine-tune diverse flow model variants, including Rectified Flow [35] and Shortcut Models [19], particularly at very few or even one denoising step. We benchmark ReinFlow in representative locomotion and manipulation tasks, including long-horizon planning with visual input and sparse reward. The episode reward of Rectified Flow policies obtained an average net growth of 135.36% after fine-tuning in challenging legged locomotion tasks while saving denoising steps and 82.63% of wall time compared to state-of-the-art diffusion RL fine-tuning method DPPO [43]. The success rate of the Shortcut Model policies in state and visual manipulation tasks achieved an average net increase of 40.34% after fine-tuning with ReinFlow at four or even one denoising step, whose performance is comparable to fine-tuned DDIM policies while saving computation time for an average of 23.20%. Project webpage: https://reinflow.github.io/
Problem

Research questions and friction points this paper is trying to address.

Fine-tuning flow matching policies for robotic control
Enhancing exploration and training stability in RL
Improving performance with fewer denoising steps
Innovation

Methods, ideas, or system contributions that make the work stand out.

Online RL fine-tunes flow matching policies
Inject learnable noise for exact likelihood computation
Benchmark shows significant reward and time savings
🔎 Similar Papers
No similar papers found.
T
Tonghe Zhang
Department of Electronic Engineering, Tsinghua University
C
Chao Yu
Department of Electronic Engineering, Tsinghua University
Sichang Su
Sichang Su
PhD student, University of Texas at Austin
Reinforcement LearningRobot Learning
Y
Yu Wang
Department of Electronic Engineering, Tsinghua University