SimGenHOI: Physically Realistic Whole-Body Humanoid-Object Interaction via Generative Modeling and Reinforcement Learning

📅 2025-08-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Generating physically plausible human-robot–object interactions (HOI) faces key challenges including unrealistic contact, limb interpenetration, and motion distortion. This paper proposes a synergistic optimization framework integrating generative modeling and reinforcement learning: a Diffusion Transformer generates long-horizon, semantically controllable HOI motion sequences, while a contact-aware whole-body RL controller refines motions in real time within physics simulation to ensure contact consistency and dynamic feasibility. Our key innovation is a closed-loop “generation–feedback–optimization” mechanism, where keyframe prediction and contact-state guidance enhance trajectory tracking robustness. Experiments demonstrate that our method significantly reduces physical violations—such as interpenetration and slippage—across diverse, long-duration manipulation tasks, achieving high motion tracking success rates. This work establishes a new paradigm for high-fidelity, embodied motion synthesis in humanoid robotics.

Technology Category

Application Category

📝 Abstract
Generating physically realistic humanoid-object interactions (HOI) is a fundamental challenge in robotics. Existing HOI generation approaches, such as diffusion-based models, often suffer from artifacts such as implausible contacts, penetrations, and unrealistic whole-body actions, which hinder successful execution in physical environments. To address these challenges, we introduce SimGenHOI, a unified framework that combines the strengths of generative modeling and reinforcement learning to produce controllable and physically plausible HOI. Our HOI generative model, based on Diffusion Transformers (DiT), predicts a set of key actions conditioned on text prompts, object geometry, sparse object waypoints, and the initial humanoid pose. These key actions capture essential interaction dynamics and are interpolated into smooth motion trajectories, naturally supporting long-horizon generation. To ensure physical realism, we design a contact-aware whole-body control policy trained with reinforcement learning, which tracks the generated motions while correcting artifacts such as penetration and foot sliding. Furthermore, we introduce a mutual fine-tuning strategy, where the generative model and the control policy iteratively refine each other, improving both motion realism and tracking robustness. Extensive experiments demonstrate that SimGenHOI generates realistic, diverse, and physically plausible humanoid-object interactions, achieving significantly higher tracking success rates in simulation and enabling long-horizon manipulation tasks. Code will be released upon acceptance on our project page: https://xingxingzuo.github.io/simgen_hoi.
Problem

Research questions and friction points this paper is trying to address.

Generating physically realistic humanoid-object interactions without artifacts
Addressing implausible contacts, penetrations, and unrealistic whole-body actions
Ensuring motion trajectories are controllable and physically plausible
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines generative modeling and reinforcement learning
Uses Diffusion Transformers for key action prediction
Mutual fine-tuning strategy enhances realism and robustness
🔎 Similar Papers
No similar papers found.
Y
Yuhang Lin
Zhejiang University
Y
Yijia Xie
Zhejiang University
J
Jiahong Xie
Zhejiang University
Y
Yuehao Huang
Zhejiang University
R
Ruoyu Wang
Zhejiang University
Jiajun Lv
Jiajun Lv
Zhejiang University
SLAM
Yukai Ma
Yukai Ma
Zhejiang University
Xingxing Zuo
Xingxing Zuo
Assistant Professor @MBZUAI
RoboticsState EstimationEmbodied AI