HOIST: Humanoid Optimization with Imitation and Sample-efficient Tuning for Manipulating Suspended Loads

📅 2026-05-29

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This work addresses the significant challenge of manipulating suspended payloads with humanoid robots, a task requiring whole-body motion and intermittent contact to control underactuated, oscillatory objects for precise placement. The authors propose a teleoperation demonstration framework that integrates vision-language-action policies, leveraging virtual reality to collect expert demonstrations and iteratively refining the policy via batch reinforcement learning to enhance sample efficiency while preserving the safety guarantees of imitation learning. Coupled with a whole-body controller, the system substantially improves end-effector placement accuracy and settling behavior on a real humanoid robot, reducing translational error by 19.9 cm and angular error by 3.56° compared to pure imitation and additional-demonstration baselines, thereby demonstrating the method’s effectiveness and potential for underactuated material handling tasks.

📝 Abstract

Manipulating suspended payloads with humanoid robots is challenging because the robot can only influence an underactuated, oscillatory load through whole-body motion and intermittent contact. Imitation learning provides safe initial behavior but does not directly optimize final placement, while reinforcement learning from scratch is unsafe and sample-inefficient on real humanoids. We present HOIST-Humanoid Optimized with Imitation and Sample-efficient Tuning for manipulating suspended loads. HOIST first finetunes a high-level vision-language-action (VLA) policy from virtual-reality (VR) teleoperation demonstrations and executes its commands through a whole-body controller. It then uses VLA rollouts and iterative batched RL to improve placement accuracy and stopping behavior. Experiments in simulation and on a real humanoid show that HOIST improves over imitation-only and additional-demonstration baselines; compared with pure VLA rollouts, HOIST reduces translational placement error by 19.9 cm and raw angular error by 3.56 degrees, demonstrating the potential of humanoids for underactuated material-handling tasks.

Problem

Research questions and friction points this paper is trying to address.

suspended loads

humanoid manipulation

underactuated systems

imitation learning

sample efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

imitation learning

sample-efficient reinforcement learning

humanoid manipulation