Hoi3DGen: Generating High-Quality Human-Object-Interactions in 3D

📅 2026-03-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing text-to-3D human-object interaction generation methods, which struggle to faithfully render textual semantics due to the Janus problem and the scarcity of high-quality interaction data. To overcome these challenges, the authors introduce an end-to-end text-driven 3D generation framework that leverages a multimodal large language model to automatically construct a high-fidelity human-object interaction dataset for the first time. The framework integrates textured mesh modeling with score distillation optimization to generate 3D interaction scenes with high fidelity and semantic consistency. Experiments demonstrate that the proposed method outperforms current baselines by 4–15× in text alignment and achieves 3–7× improvements in 3D quality, while supporting diverse object categories and interaction types, showcasing strong generalization capabilities.

Technology Category

Application Category

📝 Abstract
Modeling and generating 3D human-object interactions from text is crucial for applications in AR, XR, and gaming. Existing approaches often rely on score distillation from text-to-image models, but their results suffer from the Janus problem and do not follow text prompts faithfully due to the scarcity of high-quality interaction data. We introduce Hoi3DGen, a framework that generates high-quality textured meshes of human-object interaction that follow the input interaction descriptions precisely. We first curate realistic and high-quality interaction data leveraging multimodal large language models, and then create a full text-to-3D pipeline, which achieves orders-of-magnitude improvements in interaction fidelity. Our method surpasses baselines by 4-15x in text consistency and 3-7x in 3D model quality, exhibiting strong generalization to diverse categories and interaction types, while maintaining high-quality 3D generation.
Problem

Research questions and friction points this paper is trying to address.

3D human-object interaction
text-to-3D generation
Janus problem
interaction fidelity
high-quality 3D modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

text-to-3D
human-object interaction
multimodal large language models
3D generation
interaction fidelity
🔎 Similar Papers
No similar papers found.