🤖 AI Summary
This work addresses the challenge of drift-prone motion capture using sparse inertial sensors by leveraging underutilized geometric constraints inherent in ultra-wideband (UWB) ranging measurements. The authors propose a diffusion-based pose estimation framework that first recovers the 3D positions of sensors through a spatial layout module from UWB distances, then incorporates both these positions and inertial measurement unit (IMU) signals as conditional inputs to a diffusion process. A novel UWB-guided diffusion mechanism is introduced to explicitly enforce adherence to the geometric constraints imposed by UWB ranging during pose generation. By jointly integrating analytical layout reconstruction with guided sampling, the method achieves state-of-the-art performance, reducing joint position error by up to 22% compared to the current best approach.
📝 Abstract
Methods using inertial measurement units (IMUs) provide a wearable alternative to camera-based motion capture. To mitigate drift from inertial signals, recent sparse inertial pose estimators integrate inter-sensor distances measured by ultra-wideband (UWB) ranging. So far, UWB distances have only been used as an additional input feature, ignoring the physical constraints they impose on sensor positions. However, these distances can also be used to reconstruct the underlying 3D sensor layout, which in turn provides more informative input for pose reconstruction. We propose Ultra Diffusion Poser, a diffusion model that explicitly models these geometric constraints. It includes a Spatial Layout Module that analytically reconstructs the 3D sensor positions from UWB measurements. These sensor positions are used alongside IMU signals and UWB distances as a conditioning signal during diffusion. Still, network predictions can violate inter-sensor distance measurements. To address this, we introduce UWB-Diffusion Guidance, which encourages alignment between predicted poses and measured distances during diffusion sampling. Together, these contributions enable our model to achieve state-of-the-art performance, reducing joint position error by up to 22% over prior work.