🤖 AI Summary
Existing datasets struggle to disentangle nonverbal communication intents conveyed solely through body pose, and lack real-time, low-cost, long-range solutions suitable for embedded robotic platforms. This work introduces the first real-frame dataset comprising only 2D full-body poses annotated with ten distinct communicative intents, and proposes a lightweight model based on skeleton graph neural networks and joint motion prediction that achieves real-time inference on an NVIDIA Orin Nano. The study innovatively incorporates autoregressive consistency as an unsupervised reliability metric, theoretically proving that its correctness probability increases with the number of consistent steps, and revealing failure boundaries of high-confidence predictions. Experiments demonstrate strong correlation between this metric and industry-standard measures, with the model achieving robust performance on both real-world and synthetic data.
📝 Abstract
Body movement communicates intent at distances and in conditions where neither the face, nor speech can be captured. We study the recognition of communicative intent from 2D body pose alone. We argue that body motion is a reliable signal especially in scenarios that require real time low-cost on-device person-to-robot communication in long distance environments, such as rescue missions. However, existing resources do not isolate this signal. Affective corpora combine body, face, voice and text, while skeleton action-recognition benchmarks label the action performed rather than the message conveyed. We release a dataset of real frames of full-body pose covering ten communicative intents and we compare it against other real (IPC) and synthetic (MotionLCM, VEO3.1, Kimodo) ones that span a range of difficulty. We target systems that can run on a robot's limited onboard hardware. We benchmark multiple models, from skeleton graph classifiers to joint motion-forecasting networks, and report performance metrics together with frame rate on an embedded GPU (NVIDIA Orin~Nano), since speed matters as much as accuracy in our scenario. Finally, we show that a model's own autoregressive self-consistency works as an unsupervised reliability signal. We give a short proof that bounds the probability that a self-consistent prediction is correct, show that this probability grows with the number of consistent steps, and identify the conditions under which a confident prediction can still be false, benchmarked against industry-standard metrics.