🤖 AI Summary
Existing diffusion-based imitation learning approaches struggle to ensure contact stability and action reliability in robotic manipulation tasks involving complex free-form surface constraints, primarily due to the absence of explicit geometric modeling. This work proposes a novel framework that explicitly embeds surface geometric constraints into a diffusion policy by encoding surface geometry through two-dimensional weighted Gaussian kernels. The method integrates human demonstrations with real-time visual observations to construct a multimodal input representation, enabling the diffusion policy to infer structured surface-aware intentions. These intentions are then mapped via a similarity transformation to generate dynamic movement primitives (DMPs) that inherently satisfy the geometric constraints. Evaluated across diverse surface manipulation tasks, the approach demonstrates significant improvements in both task success rate and contact stability.
📝 Abstract
Diffusion-based imitation learning methods have driven rapid progress in robot dexterous manipulation tasks. However, they have limitations when applied to tasks that involve complex free-form surface constraints because of their lack of explicit surface geometry constraint modeling and the dynamic feasibility issue, resulting in stochastic action generation that fails to achieve reliable surface alignment and maintain stable contact. To address these limitations, we propose a novel surface constraint policy (SCP) for generating robot actions that satisfy free-form surface constraints on the basis of human demonstrations and real-time visual observations. First, the surface geometry constraint is encoded using a two-dimensional weighted Gaussian kernel function that is derived from demonstrations. Building on the encoded surface geometry constraints, the diffusion-based policy is used to infer task-level action intentions from multimodal sensory inputs, including visual observations and robot state feedback. These intentions are further transformed into surface-constrained dynamic movement primitives (DMPs) through a similarity-based action mapping method, thereby enabling smooth and compliant motion execution. The SCP achieves generation of structured surface geometric intent and dynamically admissible actions. The proposed method is validated on multiple surface manipulation tasks and compared with existing techniques. The experimental results demonstrate superior task success rates and contact stability under surface constraints.