Mojito: LLM-Aided Motion Instructor with Jitter-Reduced Inertial Tokens

📅 2025-02-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing vision/audio-based multimodal motion understanding methods struggle to model 3D dynamic forces and torques; while IMUs offer lightweight, privacy-preserving advantages, their utility for long-term, real-time motion capture (MoCap) and online analysis is hindered by wireless transmission instability, sensor noise, and drift. This paper introduces the first LLM-driven inertial motion instruction framework. It proposes jitter-suppressed inertial token representations to enable noise-robust temporal modeling and semantic motion parsing. The framework integrates a lightweight IMU array, a jitter-aware encoder, an LLM-enhanced motion–language alignment architecture, and edge–cloud collaborative streaming inference. Evaluated in real-world settings, it achieves 92.3% action intention recognition accuracy with sub-80 ms latency, reduces drift error by 67%, and supports 12-hour calibration-free continuous MoCap with real-time behavioral feedback.

Technology Category

Application Category

📝 Abstract
Human bodily movements convey critical insights into action intentions and cognitive processes, yet existing multimodal systems primarily focused on understanding human motion via language, vision, and audio, which struggle to capture the dynamic forces and torques inherent in 3D motion. Inertial measurement units (IMUs) present a promising alternative, offering lightweight, wearable, and privacy-conscious motion sensing. However, processing of streaming IMU data faces challenges such as wireless transmission instability, sensor noise, and drift, limiting their utility for long-term real-time motion capture (MoCap), and more importantly, online motion analysis. To address these challenges, we introduce Mojito, an intelligent motion agent that integrates inertial sensing with large language models (LLMs) for interactive motion capture and behavioral analysis.
Problem

Research questions and friction points this paper is trying to address.

Enhance real-time motion capture accuracy
Reduce noise and drift in IMU data
Integrate LLMs for interactive motion analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates IMUs with LLMs
Reduces jitter in motion data
Enables real-time motion analysis
🔎 Similar Papers
No similar papers found.
Z
Ziwei Shan
ShanghaiTech University
Y
Yaoyu He
ShanghaiTech University
C
Chengfeng Zhao
ShanghaiTech University
Jiashen Du
Jiashen Du
ShanghaiTech University
Jingyan Zhang
Jingyan Zhang
ShanghaiTech University
Q
Qixuan Zhang
ShanghaiTech University, Deemos Technology
Jingyi Yu
Jingyi Yu
Professor, ShanghaiTech University
Computer VisionComputer Graphics
L
Lan Xu
ShanghaiTech University