SimGenHOI: Physically Realistic Whole-Body Humanoid-Object Interaction via Generative Modeling and Reinforcement Learning

📅 2025-08-18

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Generating physically plausible human-robot–object interactions (HOI) faces key challenges including unrealistic contact, limb interpenetration, and motion distortion. This paper proposes a synergistic optimization framework integrating generative modeling and reinforcement learning: a Diffusion Transformer generates long-horizon, semantically controllable HOI motion sequences, while a contact-aware whole-body RL controller refines motions in real time within physics simulation to ensure contact consistency and dynamic feasibility. Our key innovation is a closed-loop “generation–feedback–optimization” mechanism, where keyframe prediction and contact-state guidance enhance trajectory tracking robustness. Experiments demonstrate that our method significantly reduces physical violations—such as interpenetration and slippage—across diverse, long-duration manipulation tasks, achieving high motion tracking success rates. This work establishes a new paradigm for high-fidelity, embodied motion synthesis in humanoid robotics.

Technology Category

Application Category

📝 Abstract

Generating physically realistic humanoid-object interactions (HOI) is a fundamental challenge in robotics. Existing HOI generation approaches, such as diffusion-based models, often suffer from artifacts such as implausible contacts, penetrations, and unrealistic whole-body actions, which hinder successful execution in physical environments. To address these challenges, we introduce SimGenHOI, a unified framework that combines the strengths of generative modeling and reinforcement learning to produce controllable and physically plausible HOI. Our HOI generative model, based on Diffusion Transformers (DiT), predicts a set of key actions conditioned on text prompts, object geometry, sparse object waypoints, and the initial humanoid pose. These key actions capture essential interaction dynamics and are interpolated into smooth motion trajectories, naturally supporting long-horizon generation. To ensure physical realism, we design a contact-aware whole-body control policy trained with reinforcement learning, which tracks the generated motions while correcting artifacts such as penetration and foot sliding. Furthermore, we introduce a mutual fine-tuning strategy, where the generative model and the control policy iteratively refine each other, improving both motion realism and tracking robustness. Extensive experiments demonstrate that SimGenHOI generates realistic, diverse, and physically plausible humanoid-object interactions, achieving significantly higher tracking success rates in simulation and enabling long-horizon manipulation tasks. Code will be released upon acceptance on our project page: https://xingxingzuo.github.io/simgen_hoi.

Problem

Research questions and friction points this paper is trying to address.

Generating physically realistic humanoid-object interactions without artifacts

Addressing implausible contacts, penetrations, and unrealistic whole-body actions

Ensuring motion trajectories are controllable and physically plausible

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines generative modeling and reinforcement learning

Uses Diffusion Transformers for key action prediction

Mutual fine-tuning strategy enhances realism and robustness

🔎 Similar Papers

No similar papers found.

Authors to Follow