🤖 AI Summary
Precise texture alignment and contact establishment of deformable fabrics under grayscale visual guidance using dual robotic arms remains challenging, particularly due to the scarcity of real-world annotated data and domain shift between simulation and reality.
Method: We propose an end-to-end vision-based servoing framework that requires no real labeled data. It integrates a Transformer backbone with a novel Difference Extraction Attention Module (DEAM) to enhance pose deviation estimation accuracy, and couples it with dual-arm cooperative impedance control to jointly regulate fabric pose and tension during contact, ensuring surface平整ness. Training relies solely on synthetic grayscale images.
Contribution/Results: The method achieves zero-shot transfer to real-world settings without fine-tuning. Experiments demonstrate high-precision alignment across diverse fabric textures, robust cross-domain generalization, and significantly improved practicality and scalability compared to prior approaches.
📝 Abstract
In this paper, we propose a method to align and place a fabric piece on top of another using a dual-arm manipulator and a grayscale camera, so that their surface textures are accurately matched. We propose a novel control scheme that combines Transformer-driven visual servoing with dualarm impedance control. This approach enables the system to simultaneously control the pose of the fabric piece and place it onto the underlying one while applying tension to keep the fabric piece flat. Our transformer-based network incorporates pretrained backbones and a newly introduced Difference Extraction Attention Module (DEAM), which significantly enhances pose difference prediction accuracy. Trained entirely on synthetic images generated using rendering software, the network enables zero-shot deployment in real-world scenarios without requiring prior training on specific fabric textures. Real-world experiments demonstrate that the proposed system accurately aligns fabric pieces with different textures.