CoLA-Flow Policy: Temporally Coherent Imitation Learning via Continuous Latent Action Flow Matching for Robotic Manipulation

📅 2026-01-30

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing generative policies struggle to simultaneously achieve representational expressiveness, real-time performance, and stability in long-horizon robotic manipulation. This work proposes LG-Flow Policy, which introduces flow matching for the first time in a continuous latent action space. By employing temporally regularized trajectory encoding, the method decouples global motion structure from low-level control noise, while integrating geometric-aware point cloud conditioning and multimodal execution modulation. Evaluated on both simulated and real-world platforms, LG-Flow Policy achieves near single-step inference latency and significantly outperforms baseline flow-matching approaches formulated in the original action space in terms of trajectory smoothness and task success rate, while also demonstrating greater computational efficiency than diffusion-based policies.

Technology Category

Application Category

📝 Abstract

Learning long-horizon robotic manipulation requires jointly achieving expressive behavior modeling, real-time inference, and stable execution, which remains challenging for existing generative policies. Diffusion-based approaches provide strong modeling capacity but typically incur high inference latency, while flow matching enables fast one-step generation yet often leads to unstable execution when applied directly in the raw action space. We propose LG-Flow Policy, a trajectory-level imitation learning framework that performs flow matching in a continuous latent action space. By encoding action sequences into temporally regularized latent trajectories and learning an explicit latent-space flow, the proposed approach decouples global motion structure from low-level control noise, resulting in smooth and reliable long-horizon execution. LG-Flow Policy further incorporates geometry-aware point cloud conditioning and execution-time multimodal modulation, with visual cues evaluated as a representative modality in real-world settings. Experimental results in simulation and on physical robot platforms demonstrate that LG-Flow Policy achieves near single-step inference, substantially improves trajectory smoothness and task success over flow-based baselines operating in the raw action space, and remains significantly more efficient than diffusion-based policies.

Problem

Research questions and friction points this paper is trying to address.

robotic manipulation

long-horizon tasks

generative policies

real-time inference

execution stability

Innovation

Methods, ideas, or system contributions that make the work stand out.

latent action space

flow matching

temporal coherence

imitation learning

robotic manipulation

🔎 Similar Papers

No similar papers found.

Authors to Follow