Multi-Keypoint Affordance Representation for Functional Dexterous Grasping

๐Ÿ“… 2025-02-27
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing functional grasping methods predict only coarse interaction regions, making it difficult to directly constrain 6D grasp poses and resulting in a disconnect between visual perception and dexterous manipulation. To bridge this gap, we propose Contact-anchored Multi-keypoint Affordance Representation (CMKA), which explicitly encodes task-driven functional contact points as anchors for 6D grasp pose estimation. Our approach introduces two key innovations: (1) a contact-guided weakly supervised keypoint learning mechanism, and (2) keypoint-guided grasp transformation (KGT), jointly leveraging weak supervision from human grasping images, fine-grained features from large vision models, geometric priors, and robot kinematic mappings to ensure hand-object spatial consistency. Evaluated on the FAH dataset, IsaacGym simulations, and real-robot experiments, CMKA significantly improves affordance localization accuracy, grasp pose consistency, and generalization across diverse tools and manipulation tasks.

Technology Category

Application Category

๐Ÿ“ Abstract
Functional dexterous grasping requires precise hand-object interaction, going beyond simple gripping. Existing affordance-based methods primarily predict coarse interaction regions and cannot directly constrain the grasping posture, leading to a disconnection between visual perception and manipulation. To address this issue, we propose a multi-keypoint affordance representation for functional dexterous grasping, which directly encodes task-driven grasp configurations by localizing functional contact points. Our method introduces Contact-guided Multi-Keypoint Affordance (CMKA), leveraging human grasping experience images for weak supervision combined with Large Vision Models for fine affordance feature extraction, achieving generalization while avoiding manual keypoint annotations. Additionally, we present a Keypoint-based Grasp matrix Transformation (KGT) method, ensuring spatial consistency between hand keypoints and object contact points, thus providing a direct link between visual perception and dexterous grasping actions. Experiments on public real-world FAH datasets, IsaacGym simulation, and challenging robotic tasks demonstrate that our method significantly improves affordance localization accuracy, grasp consistency, and generalization to unseen tools and tasks, bridging the gap between visual affordance learning and dexterous robotic manipulation. The source code and demo videos will be publicly available at https://github.com/PopeyePxx/MKA.
Problem

Research questions and friction points this paper is trying to address.

Enhances functional dexterous grasping precision
Improves affordance localization and grasp consistency
Bridges visual affordance learning and robotic manipulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-keypoint affordance for grasping
Contact-guided feature extraction
Keypoint-based grasp matrix transformation
๐Ÿ”Ž Similar Papers
No similar papers found.
F
Fan Yang
School of Robotics, Hunan University, China; National Engineering Research Center of Robot Visual Perception and Control Technology, Hunan University, China
Dongsheng Luo
Dongsheng Luo
Assistant Professor, Florida International University
Trustworthy AIMachine LearningGraph Neural NetworksTime Series
Wenrui Chen
Wenrui Chen
Hunan University
RoboticsHandsGraspingDexterous ManipulationHuman-Robot Collaboration
Jiacheng Lin
Jiacheng Lin
University of Illinois Urbana-Champaign
Machine LearningFoundation ModelsHealthcareRecommendation System
J
Junjie Cai
School of Robotics, Hunan University, China; National Engineering Research Center of Robot Visual Perception and Control Technology, Hunan University, China
Kailun Yang
Kailun Yang
Professor. School of Artificial Intelligence and Robotics, Hunan University (HNU); KIT; UAH; ZJU
Computer VisionComputational OpticsIntelligent VehiclesAutonomous DrivingRobotics
Zhiyong Li
Zhiyong Li
Professor of Computer Science, Hunan University
computer vision๏ผŒobject detection
Y
Yaonan Wang
School of Robotics, Hunan University, China; National Engineering Research Center of Robot Visual Perception and Control Technology, Hunan University, China