M3DHMR: Monocular 3D Hand Mesh Recovery

📅 2025-05-26

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

Monocular 3D hand mesh reconstruction suffers from severe self-occlusion, strong 2D–3D mapping ambiguity, and high joint degrees of freedom, leading to low vertex localization accuracy and poor inference efficiency. To address these challenges, this paper proposes an end-to-end vertex-level regression framework. Its key contributions are: (1) a novel Dynamic Spiral Convolution (DSC) layer that jointly adapts spatial and channel-wise features according to hand topology; (2) an anatomy-aware Region-of-Interest (ROI) attention mechanism to enhance representation learning for critical joints and occluded regions; and (3) a lightweight, 2D-guided 3D vertex regression architecture. Evaluated on the FreiHAND benchmark, our method outperforms existing real-time approaches, achieving a 12.6% reduction in vertex error and an inference speed of 38 FPS—demonstrating both state-of-the-art accuracy and efficiency.

Technology Category

Application Category

📝 Abstract

Monocular 3D hand mesh recovery is challenging due to high degrees of freedom of hands, 2D-to-3D ambiguity and self-occlusion. Most existing methods are either inefficient or less straightforward for predicting the position of 3D mesh vertices. Thus, we propose a new pipeline called Monocular 3D Hand Mesh Recovery (M3DHMR) to directly estimate the positions of hand mesh vertices. M3DHMR provides 2D cues for 3D tasks from a single image and uses a new spiral decoder consist of several Dynamic Spiral Convolution (DSC) Layers and a Region of Interest (ROI) Layer. On the one hand, DSC Layers adaptively adjust the weights based on the vertex positions and extract the vertex features in both spatial and channel dimensions. On the other hand, ROI Layer utilizes the physical information and refines mesh vertices in each predefined hand region separately. Extensive experiments on popular dataset FreiHAND demonstrate that M3DHMR significantly outperforms state-of-the-art real-time methods.

Problem

Research questions and friction points this paper is trying to address.

Challenges in monocular 3D hand mesh recovery due to high DoF and occlusion

Inefficient existing methods for predicting 3D mesh vertex positions

Proposing M3DHMR for direct vertex estimation with 2D cues and spiral decoder

Innovation

Methods, ideas, or system contributions that make the work stand out.

Directly estimates hand mesh vertices positions

Uses Dynamic Spiral Convolution (DSC) Layers

Refines mesh vertices with ROI Layer

🔎 Similar Papers

WiLoR: End-to-end 3D Hand Localization and Reconstruction in-the-wild

2024-09-18arXiv.orgCitations: 8

ByteDance

San Jose

Research Scientist Intern, Machine Perception for Input and Interaction (PhD)