CoLA-Flow Policy: Temporally Coherent Imitation Learning via Continuous Latent Action Flow Matching for Robotic Manipulation

📅 2026-01-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing generative policies struggle to simultaneously achieve representational expressiveness, real-time performance, and stability in long-horizon robotic manipulation. This work proposes LG-Flow Policy, which introduces flow matching for the first time in a continuous latent action space. By employing temporally regularized trajectory encoding, the method decouples global motion structure from low-level control noise, while integrating geometric-aware point cloud conditioning and multimodal execution modulation. Evaluated on both simulated and real-world platforms, LG-Flow Policy achieves near single-step inference latency and significantly outperforms baseline flow-matching approaches formulated in the original action space in terms of trajectory smoothness and task success rate, while also demonstrating greater computational efficiency than diffusion-based policies.

Technology Category

Application Category

📝 Abstract
Learning long-horizon robotic manipulation requires jointly achieving expressive behavior modeling, real-time inference, and stable execution, which remains challenging for existing generative policies. Diffusion-based approaches provide strong modeling capacity but typically incur high inference latency, while flow matching enables fast one-step generation yet often leads to unstable execution when applied directly in the raw action space. We propose LG-Flow Policy, a trajectory-level imitation learning framework that performs flow matching in a continuous latent action space. By encoding action sequences into temporally regularized latent trajectories and learning an explicit latent-space flow, the proposed approach decouples global motion structure from low-level control noise, resulting in smooth and reliable long-horizon execution. LG-Flow Policy further incorporates geometry-aware point cloud conditioning and execution-time multimodal modulation, with visual cues evaluated as a representative modality in real-world settings. Experimental results in simulation and on physical robot platforms demonstrate that LG-Flow Policy achieves near single-step inference, substantially improves trajectory smoothness and task success over flow-based baselines operating in the raw action space, and remains significantly more efficient than diffusion-based policies.
Problem

Research questions and friction points this paper is trying to address.

robotic manipulation
long-horizon tasks
generative policies
real-time inference
execution stability
Innovation

Methods, ideas, or system contributions that make the work stand out.

latent action space
flow matching
temporal coherence
imitation learning
robotic manipulation
🔎 Similar Papers
No similar papers found.
S
Songwei Wu
State Key Laboratory of Robotics and Systems, Harbin Institute of Technology, Harbin 150001, Heilongjiang, China
Z
Zhiduo Jiang
State Key Laboratory of Robotics and Systems, Harbin Institute of Technology, Harbin 150001, Heilongjiang, China
G
Guanghu Xie
State Key Laboratory of Robotics and Systems, Harbin Institute of Technology, Harbin 150001, Heilongjiang, China
L
Liu Yang
State Key Laboratory of Robotics and Systems, Harbin Institute of Technology, Harbin 150001, Heilongjiang, China
Liu Hong
Liu Hong
Associate professor, Institute of System Engineering, Huazhong University of Science and Technology
Vulnerability analysis of Transportation System