EgoTactile: Learning Grasp Pressure for Everyday Objects from Egocentric Video

📅 2026-06-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited generalization of existing methods in complex 3D object interactions by proposing a novel paradigm for estimating full-hand grasping pressure from first-person videos. To this end, we introduce EgoTactile, the first benchmark pairing egocentric video with full-hand tactile pressure annotations, including a bare-hand transfer subset to enable cross-scenario evaluation. Methodologically, we present EgoPressureDiff, a conditional diffusion model that integrates a pretrained video diffusion backbone, physics-informed feature correction layers, and a vision–tactile alignment mechanism, effectively combining world-knowledge priors with physical constraints to resolve visual–physical ambiguities. Experiments demonstrate that our approach significantly outperforms current methods on EgoTactile and exhibits strong robustness and transferability in real-world scenarios.
📝 Abstract
Estimating full-hand grasp pressure from egocentric video is critical for immersive VR and robotic manipulation, yet dense tactile sensing often relies on intrusive hardware. Existing vision-based methods predominantly rely on planar surfaces or fingertip contacts, failing to generalize to complex 3D object interactions. Therefore, we introduce EgoTactile, a benchmark pairing egocentric video with full-hand pressure supervision for diverse everyday objects, incorporating a bare-hand transfer subset to enable generalization to natural scenarios. Leveraging this benchmark, we first establish EgoPressureFormer as a discriminative baseline. Beyond this, to explicitly address the uncertainty in partial observations, we propose EgoPressureDiff, a conditional diffusion framework that adapts a large-scale pre-trained video diffusion backbone. By combining rich world knowledge priors with a Physically-Informed Feature Rectification layer to inject semantic constraints, our approach effectively infers plausible contact patterns and resolves visual-physical ambiguities. Extensive experiments demonstrate that our method achieves superior performance on the benchmark and robust transferability to in-the-wild scenarios. Our project page is available at https://egotactile.github.io/.
Problem

Research questions and friction points this paper is trying to address.

grasp pressure
egocentric video
tactile sensing
3D object interaction
full-hand contact
Innovation

Methods, ideas, or system contributions that make the work stand out.

egocentric vision
grasp pressure estimation
conditional diffusion model
tactile sensing
physically-informed learning
🔎 Similar Papers
No similar papers found.
Yuan Zeng
Yuan Zeng
Associate professor, Shenzhen Technology University, Southern University of Science and Technology
Image inpainting and 3D reconstruction
Y
Yujia Shi
School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China; Department of Network, Pengcheng Laboratory, Shenzhen, China
Tiao Tan
Tiao Tan
Phd, tsinghua university
computer visionembodied ai
X
Xingting Li
Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
Y
Yaqi Qin
JQ Industries, Qingdao, China
Zongqing Lu
Zongqing Lu
Peking University | BeingBeyond
Reinforcement learning
Wenming Yang
Wenming Yang
Tsinghua University
Computer VisionImage Processing
Jing-Hao Xue
Jing-Hao Xue
Professor, Department of Statistical Science, University College London
Statistical Pattern RecognitionMachine LearningImage Processing
Q
Qingmin Liao
Shenzhen International Graduate School, Tsinghua University, Shenzhen, China