SPARC: Reliable Spatial Annotations from Robot Demonstrations at Scale

📅 2026-06-11

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing automated annotation methods struggle to assess the quality of spatial labels and often introduce noise or discard valid samples. This work proposes SPARC, a novel framework that introduces, for the first time, a task-aware spatiotemporal consistency-based reliability calibration mechanism to generate structured spatial annotations—such as bounding boxes, trajectories, and manipulation phases—without requiring human verification, while also quantifying their reliability. By integrating object detection, trajectory modeling, and spatiotemporal consistency analysis, SPARC constructs a risk-aware automatic annotation pipeline and introduces the IA-Bench benchmark to evaluate object localization accuracy. Experiments on 1.7k human-annotated demonstrations show that SPARC retains three times more valid samples at high-precision operating points with improved localization accuracy. Models fine-tuned on SPARC-generated labels achieve state-of-the-art performance among comparable models on object reference and pointing tasks and demonstrate enhanced robustness in real-world cluttered environments.

📝 Abstract

This work introduces Spatial Annotations from Robot Demonstrations with Reliability Calibration (SPARC), a risk-aware framework that automatically labels robot demonstrations with structured spatial annotations and assigns each annotation a reliability score. Structured spatial annotations, such as bounding boxes, object trajectories, and manipulation phase labels, benefit a broad range of robotics applications from training grounded robot policies and embodied foundation models to motion planning and hierarchical task composition. Existing automated pipelines generate such annotations at scale but provide no reliable quality signal: detector confidence is poorly calibrated for annotation correctness, forcing a choice between accepting noisy labels or discarding useful samples. In contrast to existing automated pipelines, SPARC leverages the spatio-temporal structure inherent to robot tasks to generate a reliability signal, reducing noisy labels and retaining more useful samples. We further introduce Interaction-Aware Bench (IA-Bench), a benchmark that measures model accuracy in grounding the locations of interacted objects in robot demonstrations. On 1.7k human-annotated demonstrations spanning diverse embodiments and scenarios, SPARC significantly outperforms detection-only baselines in localization accuracy while retaining three times more samples at high-precision operating points. Our experiments demonstrate that models finetuned on our annotations achieve state-of-the-art results on object-grounding and pointing benchmarks among similarly sized models, while remaining competitive on broader spatial-reasoning suites without manually verified or annotated training data. Furthermore, policies trained on SPARC-generated annotations outperform baselines in cluttered, visually ambiguous real-world scenes. Code, data, and models are available at intuitive-robots.github.io/sparc-labeling.

Problem

Research questions and friction points this paper is trying to address.

spatial annotations

robot demonstrations

reliability calibration

annotation quality

object grounding

Innovation

Methods, ideas, or system contributions that make the work stand out.

reliability calibration

structured spatial annotations

robot demonstrations