VAIC: Vision-Guided Humanoid Agile Object Interaction Control via Decoupled Commands

📅 2026-06-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Humanoid robots struggle to achieve agile and generalizable object interaction in unstructured environments, as existing approaches rely heavily on dense reference trajectories and full state observations, limiting real-world deployment. This work proposes the VAIC framework, which leverages only onboard depth images, historical proprioception, and decoupled user commands to implicitly infer object dynamics and unify control across diverse dynamic interaction tasks. Through a two-stage policy distillation process and a recursive object adaptation module, VAIC operates without requiring perfect state information and significantly outperforms baseline methods in highly dynamic tasks such as box carrying, cart pushing, and skateboard balancing. The approach enhances both the generalization capability and practical utility of humanoid robots in real-world settings.
📝 Abstract
Humanoid robots hold immense potential for real-world assistance, yet agile interaction with objects in unstructured environments demands tightly coupled whole-body coordination. Despite recent advancements, current controllers face a critical deployment gap. They rely heavily on dense reference trajectories and perfect state observability, which inherently limits physical generalization. We present Vision Guided Agile Interaction Control (VAIC), a unified framework that bridges this gap by operating exclusively on onboard depth, historical proprioception, and a decoupled user command interface. VAIC employs a two-stage distillation paradigm. First, a privileged teacher policy masters diverse interaction skills using precise object kinematics and exact environmental states. Second, a deployable student policy distills these capabilities by replacing full body tracking with velocity targets across multiple axes and an interaction indicator for each frame. The student utilizes a recurrent object adaptation module to implicitly infer unobservable object dynamics from raw depth streams and proprioception. Evaluations and real-world deployments on the humanoid robot demonstrate that a single VAIC policy successfully executes highly diverse dynamic tasks. These tasks include box carrying, cart interaction, and skateboarding, consistently outperforming baselines and advancing autonomous humanoid deployment.
Problem

Research questions and friction points this paper is trying to address.

humanoid robots
agile object interaction
state observability
reference trajectories
physical generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

vision-guided control
decoupled commands
policy distillation
object dynamics inference
humanoid agility
🔎 Similar Papers
2024-07-16Neural Information Processing SystemsCitations: 16
D
Dongting Li
Tsinghua University
Q
Qianyang Wu
Xiaomi Robotics Lab
X
Xingyu Chen
HKUST(Guangzhou)
L
Liang Li
Xiaomi Robotics Lab
Y
Yuhang Lin
Xiaomi Robotics Lab
S
Sikai Wu
Xiaomi Robotics Lab
G
Guoyao Zhang
Xiaomi Robotics Lab
M
Mingliang Zhou
Xiaomi Robotics Lab
D
Diyun Xiang
Xiaomi Robotics Lab
Q
Qiang Zhang
HKUST(Guangzhou)
Renjing Xu
Renjing Xu
HKUST(GZ)
Brain-inspired ComputingHumanoid Computing
Jianzhu Ma
Jianzhu Ma
Tsinghua University
Machine LearningComputational BiologyBioinformatics