🤖 AI Summary
In robotic cloth manipulation, large deformations and material variability degrade action accuracy and real-time performance. To address this, we propose an end-to-end vision-to-action mapping method based on *Action Fields*, which directly regresses pixel-wise end-effector action vectors from a single RGB scene image while jointly predicting a spatial manipulation confidence map—enabling interpretable and filterable action generation. Our key innovations include: (1) formulating *Action Fields* to unify the representation of continuous spatial action distributions; and (2) designing a dual-branch CNN architecture that jointly optimizes action regression and spatial confidence estimation. In simulation, our method improves garment unfolding/alignment success rates by 12.7% and accelerates inference by 3.8× over baselines. Real-world experiments demonstrate robust flattening performance across diverse textiles—including T-shirts and towels—substantially enhancing generalization and deployment practicality.
📝 Abstract
Garment manipulation using robotic systems is a challenging task due to the diverse shapes and deformable nature of fabric. In this paper, we propose a novel method for robotic garment manipulation that significantly improves the accuracy while reducing computational time compared to previous approaches. Our method features an action generator that directly interprets scene images and generates pixel-wise end-effector action vectors using a neural network. The network also predicts a manipulation score map that ranks potential actions, allowing the system to select the most effective action. Extensive simulation experiments demonstrate that our method achieves higher unfolding and alignment performances and faster computation time than previous approaches. Real-world experiments show that the proposed method generalizes well to different garment types and successfully flattens garments.