ACG: Action Coherence Guidance for Flow-based VLA models

📅 2025-10-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Action noise—such as jitter and pauses—in human demonstrations degrades trajectory coherence in flow-matching-based vision-language-action (VLA) models, leading to deployment instability and failure in fine-grained manipulation. To address this, we propose a **training-free, test-time action coherence guidance method** that dynamically refines action sequences during inference to enhance smoothness and temporal consistency, significantly improving robustness to demonstration noise. Our approach is framework-agnostic, seamlessly integrating with both diffusion and flow-matching VLA architectures without introducing additional parameters or training overhead. We evaluate it on RoboCasa, DexMimicGen, and real-world SO-101 tasks, demonstrating substantial improvements in action coherence metrics and task success rates. The method provides a lightweight, general-purpose, plug-and-play stability enhancement for practical VLA deployment.

Technology Category

Application Category

📝 Abstract
Diffusion and flow matching models have emerged as powerful robot policies, enabling Vision-Language-Action (VLA) models to generalize across diverse scenes and instructions. Yet, when trained via imitation learning, their high generative capacity makes them sensitive to noise in human demonstrations: jerks, pauses, and jitter which reduce action coherence. Reduced action coherence causes instability and trajectory drift during deployment, failures that are catastrophic in fine-grained manipulation where precision is crucial. In this paper, we present Action Coherence Guidance (ACG) for VLA models, a training-free test-time guidance algorithm that improves action coherence and thereby yields performance gains. Evaluated on RoboCasa, DexMimicGen, and real-world SO-101 tasks, ACG consistently improves action coherence and boosts success rates across diverse manipulation tasks. Code and project page are available at https://github.com/DAVIAN-Robotics/ACG and https://DAVIAN-Robotics.github.io/ACG , respectively.
Problem

Research questions and friction points this paper is trying to address.

Improving action coherence in vision-language-action models
Reducing trajectory drift during robotic manipulation deployment
Mitigating noise sensitivity in imitation learning policies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free test-time guidance algorithm for VLA models
Improves action coherence in vision-language-action policies
Reduces trajectory drift and boosts manipulation success rates
🔎 Similar Papers
No similar papers found.