🤖 AI Summary
To address high point-to-point latency and the absence of grasp intention priors in mixed reality interaction, this paper proposes a real-time grasp intention prediction method relying solely on markerless hand kinematic data. The approach employs a lightweight LSTM network to model local finger dynamics, taking real-time joint trajectory sequences as input and simultaneously predicting grasp timing, target distance, and target size. It achieves, for the first time, sub-21-ms grasp timing prediction (error < 21 ms), centimeter-level distance estimation (error < 1 cm), and peripheral-free target size classification with >97% accuracy. Crucially, the method operates entirely without external sensors—eliminating reliance on depth cameras, inertial measurement units, or other hardware—and satisfies stringent real-time constraints (end-to-end latency < 33 ms). By enabling low-latency, context-aware perception of user intent, it establishes a foundational capability for natural, responsive mixed reality interaction.
📝 Abstract
The ability to predict the object the user intends to grasp offers essential contextual information and may help to leverage the effects of point-to-point latency in interactive environments. This paper explores the feasibility and accuracy of real-time recognition of uninstrumented objects based on hand kinematics during reach-to-grasp actions. In a data collection study, we recorded the hand motions of 16 participants while reaching out to grasp and then moving real and synthetic objects. Our results demonstrate that even a simple LSTM network can predict the time point at which the user grasps an object with a precision better than 21 ms and the current distance to this object with a precision better than 1 cm. The target's size can be determined in advance with an accuracy better than 97%. Our results have implications for designing adaptive and fine-grained interactive user interfaces in ubiquitous and mixed-reality environments.