UMI-Bench 1.0: An Open and Reproducible Real-World Benchmark for Tabletop Robotic Manipulation with UMI Data

📅 2026-06-08

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the absence of standardized, reproducible evaluation benchmarks for real-world robotic manipulation tailored to Universal Manipulation Interface (UMI)-style policies. To bridge this gap, we introduce the first physical-world evaluation platform specifically designed for UMI policies, which standardizes the entire pipeline—from data collection to deployment evaluation—through unified protocols for data acquisition, automated scene resetting, policy execution, and structured logging. Built upon the UMI data paradigm, our platform integrates wrist-mounted visual observations with canonical action representations, enabling open, auditable, and quantitative assessment of policy generalization and reliability in tabletop manipulation tasks.

📝 Abstract

Real-robot evaluation is essential for understanding whether learned manipulation policies can operate reliably outside curated demonstrations. This need is particularly pressing for Universal Manipulation Interface (UMI)-style policies, whose performance depends on the coupling between wrist-view observations, action representation, data collection, and physical deployment. Existing real-world benchmarks have made important progress, but they are not designed around this UMI data-to-deployment setting. We present UMI-Bench 1.0, a local-first real-robot benchmark for standardized evaluation of UMI-style manipulation policies. To the best of our knowledge, this is the first benchmark dedicated to real-world evaluation of UMI-based manipulation models. UMI-Bench aligns data collection, scene reset, policy execution, result logging, and task-factor analysis within a unified protocol. By making the full evaluation process reproducible and auditable, UMI-Bench provides a practical testbed for measuring how UMI-trained policies generalize to real physical manipulation.

Problem

Research questions and friction points this paper is trying to address.

UMI

robotic manipulation

real-world benchmark

policy evaluation

generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

UMI-Bench

real-world benchmark

robotic manipulation

reproducible evaluation

Universal Manipulation Interface

🔎 Similar Papers

No similar papers found.

Field AI

Boston

Research Scientist Intern, Robotic Control Policy (PhD)