RoboSeek: You Need to Interact with Your Objects

πŸ“… 2025-09-22
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Addressing the challenges of sequential decision-making, stringent physical constraints, and high perceptual uncertainty in long-horizon robotic manipulation tasks, this paper introduces RoboSeekβ€”a novel embodied manipulation framework driven by interactive experience. Methodologically, RoboSeek establishes a real-to-sim-to-real transfer pipeline: multi-view 3D reconstruction generates photorealistic and physically consistent simulation environments, enabling efficient policy transfer from simulation to reality; it further integrates reinforcement learning with cross-entropy optimization, incorporating visual priors to enhance policy generalization. Evaluated on eight complex long-horizon manipulation tasks, RoboSeek achieves a mean success rate of 79%, substantially outperforming existing baselines. This demonstrates its robustness in dynamic real-world environments and strong cross-task deployability.

Technology Category

Application Category

πŸ“ Abstract
Optimizing and refining action execution through exploration and interaction is a promising way for robotic manipulation. However, practical approaches to interaction driven robotic learning are still underexplored, particularly for long-horizon tasks where sequential decision-making, physical constraints, and perceptual uncertainties pose significant chal lenges. Motivated by embodied cognition theory, we propose RoboSeek, a framework for embodied action execution that leverages interactive experience to accomplish manipulation tasks. RoboSeek optimizes prior knowledge from high-level perception models through closed-loop training in simulation and achieves robust real-world execution via a real2sim2real transfer pipeline. Specifically, we first replicate real-world environments in simulation using 3D reconstruction to provide visually and physically consistent environments., then we train policies in simulation using reinforcement learning and the cross-entropy method leveraging visual priors. The learned policies are subsequently deployed on real robotic platforms for execution. RoboSeek is hardware-agnostic and is evaluated on multiple robotic platforms across eight long-horizon ma nipulation tasks involving sequential interactions, tool use, and object handling. Our approach achieves an average success rate of 79%, significantly outperforming baselines whose success rates remain below 50%, highlighting its generalization and robustness across tasks and platforms. Experimental results validate the effectiveness of our training framework in complex, dynamic real-world settings and demonstrate the stability of the proposed real2sim2real transfer mechanism, paving the way for more generalizable embodied robotic learning. Project Page: https://russderrick.github.io/Roboseek/
Problem

Research questions and friction points this paper is trying to address.

Addresses challenges in long-horizon robotic manipulation tasks
Optimizes action execution through interactive experience and exploration
Enables robust real-world execution via simulation-to-reality transfer
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages interactive experience for robotic manipulation tasks
Uses real2sim2real transfer with 3D reconstruction and simulation
Trains policies via reinforcement learning with visual priors
πŸ”Ž Similar Papers
No similar papers found.
Yibo Peng
Yibo Peng
Carnegie Mellon University
Code GenerationMultimodal NLPAI Agents
J
Jiahao Yang
FNii-Shenzhen
S
Shenhao Yan
FNii-Shenzhen, Northeastern University
Z
Ziyu Huang
FNii-Shenzhen, The Chinese University of Hong Kong, Shenzhen
S
Shuang Li
Infused Synapse AI
Shuguang Cui
Shuguang Cui
Distinguished Presidential Chair Professor, School of Science and Engineering, CUHKSZ
AI+NetworkingWireless Communications
Y
Yiming Zhao
FNii-Shenzhen, Harbin Engineering University, Infused Synapse AI
Yatong Han
Yatong Han
Chinese University of Hong Kong(SZ)
Embodied AINeuro AILiquid Neural NetworksComputational Biology