🤖 AI Summary
Existing handheld master–slave systems suffer from weak tactile perception and severe pose tracking drift during bimanual high-contact manipulation tasks. To address these limitations, we propose DuoTact—a soft tactile sensor integrating high-resolution vision-based tactile imaging with 3D point-cloud-based deformation representation—to enhance tactile robustness and policy generalizability. We further design a unified 6-DoF bimanual pose estimation framework leveraging Meta Quest controllers, effectively eliminating SLAM-induced trajectory drift. The system holistically integrates soft sensing, 3D deformation reconstruction, point-cloud-driven policy learning, and immersive pose tracking to enable high-fidelity bimanual robotic manipulation data acquisition. User studies demonstrate high usability across both novice and expert users. In four representative bimanual mechanical tasks, our approach significantly outperforms baseline methods, validating its superior robustness and task execution capability.
📝 Abstract
Handheld devices have opened up unprecedented opportunities to collect large-scale, high-quality demonstrations efficiently. However, existing systems often lack robust tactile sensing or reliable pose tracking to handle complex interaction scenarios, especially for bimanual and contact-rich tasks. In this work, we propose ViTaMIn-B, a more capable and efficient handheld data collection system for such tasks. We first design DuoTact, a novel compliant visuo-tactile sensor built with a flexible frame to withstand large contact forces during manipulation while capturing high-resolution contact geometry. To enhance the cross-sensor generalizability, we propose reconstructing the sensor's global deformation as a 3D point cloud and using it as the policy input. We further develop a robust, unified 6-DoF bimanual pose acquisition process using Meta Quest controllers, which eliminates the trajectory drift issue in common SLAM-based methods. Comprehensive user studies confirm the efficiency and high usability of ViTaMIn-B among novice and expert operators. Furthermore, experiments on four bimanual manipulation tasks demonstrate its superior task performance relative to existing systems.