GHOST: Fast Category-agnostic Hand-Object Interaction Reconstruction from RGB Videos using Gaussian Splatting

πŸ“… 2026-03-19
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Reconstructing physically consistent, category-agnostic dynamic hand–object interactions from monocular RGB videos remains challenging, as existing approaches either rely on category-specific templates or are computationally expensive and struggle to ensure 3D fidelity. This work proposes a unified representation based on 2D Gaussian splatting, modeling both hands and objects as dense, view-consistent Gaussian ellipsoids to enable efficient, complete, and animatable 3D reconstruction. Key innovations include geometric prior retrieval with a consistency loss to recover occluded regions, a grasp-aware alignment mechanism to jointly refine hand translation and object scale, and a hand-aware background loss to prevent erroneous penalization of occluded object parts. The method achieves state-of-the-art 3D accuracy and 2D rendering quality on ARCTIC, HO3D, and in-the-wild datasets, operating an order of magnitude faster than existing category-agnostic approaches.

Technology Category

Application Category

πŸ“ Abstract
Understanding realistic hand-object interactions from monocular RGB videos is essential for AR/VR, robotics, and embodied AI. Existing methods rely on category-specific templates or heavy computation, yet still produce physically inconsistent hand-object alignment in 3D. We introduce GHOST (Gaussian Hand-Object Splatting), a fast, category-agnostic framework for reconstructing dynamic hand-object interactions using 2D Gaussian Splatting. GHOST represents both hands and objects as dense, view-consistent Gaussian discs and introduces three key innovations: (1) a geometric-prior retrieval and consistency loss that completes occluded object regions, (2) a grasp-aware alignment that refines hand translations and object scale to ensure realistic contact, and (3) a hand-aware background loss that prevents penalizing hand-occluded object regions. GHOST achieves complete, physically consistent, and animatable reconstructions from a single RGB video while running an order of magnitude faster than prior category-agnostic methods. Extensive experiments on ARCTIC, HO3D, and in-the-wild datasets demonstrate state-of-the-art accuracy in 3D reconstruction and 2D rendering quality, establishing GHOST as an efficient and robust solution for realistic hand-object interaction modeling. Code is available at https://github.com/ATAboukhadra/GHOST.
Problem

Research questions and friction points this paper is trying to address.

hand-object interaction
category-agnostic reconstruction
monocular RGB video
3D consistency
occlusion handling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Gaussian Splatting
hand-object interaction
category-agnostic reconstruction
geometric prior
grasp-aware alignment
πŸ”Ž Similar Papers
No similar papers found.
A
Ahmed Tawfik Aboukhadra
RPTU, DFKI-A V Kaiserslautern
M
Marcel Rogge
RPTU, DFKI-A V Kaiserslautern
N
Nadia Robertini
DFKI-A V Kaiserslautern
A
Abdalla Arafa
RPTU, DFKI-A V Kaiserslautern
J
Jameel Malik
NUST-SEECS Pakistan
A
Ahmed Elhayek
UPM Saudi Arabia
Didier Stricker
Didier Stricker
Professor for Computer Science, University Kaiserslautern
augmented realitycomputer visionimage processingbody sensor networkshci