Egocentric Action-aware Inertial Localization in Point Clouds

πŸ“… 2025-05-20
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses inertial localization from head-mounted IMUs in egocentric 3D point clouds, tackling two core challenges: trajectory drift induced by IMU noise and difficulties in modeling diverse human motions. We propose EAILβ€”a novel end-to-end framework that jointly optimizes localization and action recognition. First, it leverages human actions as spatial anchors to compensate for IMU driftβ€”a conceptual innovation. Second, it introduces a hierarchical multimodal contrastive alignment mechanism that fuses IMU time-series signals, local geometric features from point clouds, and spatiotemporal reasoning. Evaluated on multiple benchmarks, EAIL achieves state-of-the-art performance: reducing localization error by 32% and attaining 91.4% accuracy in action sequence recognition.

Technology Category

Application Category

πŸ“ Abstract
This paper presents a novel inertial localization framework named Egocentric Action-aware Inertial Localization (EAIL), which leverages egocentric action cues from head-mounted IMU signals to localize the target individual within a 3D point cloud. Human inertial localization is challenging due to IMU sensor noise that causes trajectory drift over time. The diversity of human actions further complicates IMU signal processing by introducing various motion patterns. Nevertheless, we observe that some actions observed through the head-mounted IMU correlate with spatial environmental structures (e.g., bending down to look inside an oven, washing dishes next to a sink), thereby serving as spatial anchors to compensate for the localization drift. The proposed EAIL framework learns such correlations via hierarchical multi-modal alignment. By assuming that the 3D point cloud of the environment is available, it contrastively learns modality encoders that align short-term egocentric action cues in IMU signals with local environmental features in the point cloud. These encoders are then used in reasoning the IMU data and the point cloud over time and space to perform inertial localization. Interestingly, these encoders can further be utilized to recognize the corresponding sequence of actions as a by-product. Extensive experiments demonstrate the effectiveness of the proposed framework over state-of-the-art inertial localization and inertial action recognition baselines.
Problem

Research questions and friction points this paper is trying to address.

Localizing individuals in 3D point clouds using head-mounted IMU signals
Compensating for IMU trajectory drift via action-environment correlations
Aligning egocentric action cues with spatial environmental features
Innovation

Methods, ideas, or system contributions that make the work stand out.

Egocentric action cues for 3D localization
Hierarchical multi-modal alignment learning
Contrastive learning of IMU and point cloud
πŸ”Ž Similar Papers
No similar papers found.