🤖 AI Summary
Existing one-step generative imitation learning methods struggle to effectively constrain the action manifold due to their neglect of trajectory evolution dynamics during training, leading to degraded control accuracy. This work proposes a one-shot action generation framework that obviates the need for explicit drift field estimation by adaptively constructing a scalar potential function through comparison between local conditional expert geometry and global reference geometry, thereby implicitly correcting drift. To our knowledge, this is the first approach to incorporate an implicit drift-correction mechanism during training within a one-step policy, circumventing the ill-posedness inherent in explicit vector field estimation. Experiments demonstrate that the proposed method significantly outperforms explicit drift-based approaches across 2D, 3D, and real-world robotic tasks, while matching the performance of strong one-step baselines, thus enhancing both accuracy and stability in action generation.
📝 Abstract
Generative action policies based on diffusion or flow matching excel in behavior cloning, yet their iterative sampling is prohibitive for high-frequency robot control. While recent one-step formulations alleviate this latency, they inevitably discard the intermediate trajectory evolution that provides crucial action correction. Directly recovering this mechanism by explicitly estimating a training-time drifting field is mathematically ill-posed due to extreme conditional demonstration sparsity. We introduce Implicit Drifting Policy (IDP), a one-step imitation learning framework that brings the training-time correction of Drifting into policy learning without explicit vector field estimation. IDP extracts a conditional expert geometry from the local variation of observation-similar expert actions, and compares it against a global reference geometry to isolate condition-specific constraints. This local geometric structure adaptively weights a scalar potential objective. Combined with an expert-proximal terminal evaluation, IDP directly enforces manifold constraints on the one-step generator during training. Extensive evaluations across 2D, 3D, and real-world manipulation tasks show IDP effectively maintains adherence to valid action manifolds, improving upon explicit drifting methods and achieving competitive performance with strong one-step baselines.