DuoBench: A Reproducible Benchmark for Bimanual Manipulation in Simulation and the Real World

📅 2026-06-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing benchmarks struggle to effectively evaluate the control complexity and failure modes inherent in dual-arm collaborative manipulation. To address this gap, this work proposes DuoBench—the first reproducible evaluation framework specifically designed for dual-arm tasks—encompassing 11 distinct tasks across four categories of collaboration and supporting deployment in both simulation and real-world environments. DuoBench introduces staged fine-grained assessment, cross-modal policy evaluation spanning vision, language, and action, and a human teleoperation dataset, with task consistency ensured through the FR3 Duo platform and 3D-printed assets. Experimental results demonstrate that current policies exhibit significant limitations in early-stage interaction, concurrent bimanual execution, and sim-to-real transfer, and that DuoBench can effectively uncover and semantically interpret these failure modes.
📝 Abstract
Bimanual robot systems substantially expand manipulation capabilities, but coordinating two arms introduces additional control complexity and failure modes that are not well captured by existing benchmarks. We introduce DuoBench, an extensible benchmarking framework for bimanual manipulation policies on the FR3 Duo platform. DuoBench comprises eleven tasks spanning four coordination categories, implemented in simulation and partially reproduced in the real world through reproducible task recipes with 3D-printable assets. In addition, we propose a stage-based evaluation scheme that supports fine-grained semantic failure analysis beyond binary success and provide human-teleoperated datasets for all benchmark tasks. We benchmark several dual-arm imitation-learning and vision-language-action policies in simulation and on real hardware. Our results show that current policies remain challenged by bimanual manipulation, particularly in early interaction stages, parallel arm execution, and transfer between simulation and real-world settings. DuoBench provides a reproducible testbed for diagnosing these failure modes and studying future methods for dual-arm policy learning. Code, datasets, and videos are available at https://duobench.github.io/
Problem

Research questions and friction points this paper is trying to address.

bimanual manipulation
benchmark
dual-arm coordination
sim-to-real transfer
reproducibility
Innovation

Methods, ideas, or system contributions that make the work stand out.

bimanual manipulation
reproducible benchmark
stage-based evaluation
teleoperated dataset
sim-to-real transfer
🔎 Similar Papers
No similar papers found.
T
Tobias Jülg
University of Technology Nuremberg
S
Seongjin Bien
University of Technology Nuremberg
S
Simon Hilber
Karlsruhe Institute of Technology
Y
Yannik Blei
University of Technology Nuremberg
P
Pierre Krack
University of Technology Nuremberg
M
Maximilian Li
Karlsruhe Institute of Technology
S
Sven Parusel
Franka Robotics
Rudolf Lioutikov
Rudolf Lioutikov
TT-Professor, Intuitive Robots Lab, Karlsruhe Institute of Technology
Machine LearningRoboticsRobot LearningReinforcement LearningImitation Learning
Florian Walter
Florian Walter
University of Technology Nuremberg, Machine Intelligence Lab
Machine IntelligenceRoboticsMachine LearningAICognitive Robotics
Wolfram Burgard
Wolfram Burgard
Professor of Computer Science, University of Technology Nuremberg
RoboticsArtificial IntelligenceAIMachine LearningComputer Vision