HandCept: A Visual-Inertial Fusion Framework for Accurate Proprioception in Dexterous Hands

📅 2025-05-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Dexterous hands suffer from noisy joint-angle estimation, severe inertial drift, and poor generalization under high-dynamic conditions—key bottlenecks in proprioceptive sensing. Method: We propose a real-time vision-inertial tightly coupled proprioception framework. It introduces, for the first time, a zero-shot learning–driven architecture fusing RGB-D data with multi-IMU (9-axis) measurements; employs a latency-free extended Kalman filter (EKF) for millisecond-level sensor fusion; and adopts a unified inter-IMU reference frame to simplify calibration. Contribution/Results: The method achieves joint-angle estimation errors of only 2°–4° with no observable drift throughout operation. It significantly outperforms both pure-vision and pure-inertial baselines. Furthermore, we open-source a high-fidelity differentiable rendering pipeline, enabling efficient sim-to-real transfer learning for rapid model adaptation.

Technology Category

Application Category

📝 Abstract
As robotics progresses toward general manipulation, dexterous hands are becoming increasingly critical. However, proprioception in dexterous hands remains a bottleneck due to limitations in volume and generality. In this work, we present HandCept, a novel visual-inertial proprioception framework designed to overcome the challenges of traditional joint angle estimation methods. HandCept addresses the difficulty of achieving accurate and robust joint angle estimation in dynamic environments where both visual and inertial measurements are prone to noise and drift. It leverages a zero-shot learning approach using a wrist-mounted RGB-D camera and 9-axis IMUs, fused in real time via a latency-free Extended Kalman Filter (EKF). Our results show that HandCept achieves joint angle estimation errors between $2^{circ}$ and $4^{circ}$ without observable drift, outperforming visual-only and inertial-only methods. Furthermore, we validate the stability and uniformity of the IMU system, demonstrating that a common base frame across IMUs simplifies system calibration. To support sim-to-real transfer, we also open-sourced our high-fidelity rendering pipeline, which is essential for training without real-world ground truth. This work offers a robust, generalizable solution for proprioception in dexterous hands, with significant implications for robotic manipulation and human-robot interaction.
Problem

Research questions and friction points this paper is trying to address.

Accurate joint angle estimation in dexterous hands
Robust proprioception in dynamic, noisy environments
Overcoming limitations of visual-inertial sensor fusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Visual-inertial fusion for accurate hand proprioception
Zero-shot learning with RGB-D camera and IMUs
Latency-free Extended Kalman Filter for real-time fusion
J
Junda Huang
Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong
Jianshu Zhou
Jianshu Zhou
University of California, Berkeley
RoboticsDexterous ManipulationRobotic HandsEmbodied IntelligenceHuman–Robot Interface
H
Honghao Guo
Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong
Yunhui Liu
Yunhui Liu
Nanjing University
Graph Machine Learning