SemGeoMo: Dynamic Contextual Human Motion Generation with Semantic and Geometric Guidance

📅 2025-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address insufficient semantic plausibility and geometric accuracy in human interaction motion generation within dynamic environments, this paper proposes a diffusion-based generative framework jointly guided by text, functional, and joint-level semantics and geometry. Methodologically, it integrates CLIP-based textual encoding, functional graph modeling, and skeletal geometric constraint losses to enable multimodal conditional generation. Crucially, we introduce a novel cross-level semantic alignment mechanism that simultaneously ensures task-intent consistency and physical feasibility throughout motion synthesis. Evaluated on three standard benchmarks, our approach achieves state-of-the-art performance, demonstrating significantly improved generalization to unseen interactive objects and scene layouts. Quantitative and qualitative results confirm superior motion quality and environmental adaptability compared to existing methods.

Technology Category

Application Category

📝 Abstract
Generating reasonable and high-quality human interactive motions in a given dynamic environment is crucial for understanding, modeling, transferring, and applying human behaviors to both virtual and physical robots. In this paper, we introduce an effective method, SemGeoMo, for dynamic contextual human motion generation, which fully leverages the text-affordance-joint multi-level semantic and geometric guidance in the generation process, improving the semantic rationality and geometric correctness of generative motions. Our method achieves state-of-the-art performance on three datasets and demonstrates superior generalization capability for diverse interaction scenarios. The project page and code can be found at https://4dvlab.github.io/project_page/semgeomo/.
Problem

Research questions and friction points this paper is trying to address.

Generates human interactive motions in dynamic environments.
Improves semantic rationality and geometric correctness of motions.
Achieves state-of-the-art performance across diverse interaction scenarios.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages text-affordance-joint multi-level guidance
Improves semantic rationality and geometric correctness
Achieves state-of-the-art performance on datasets
🔎 Similar Papers
No similar papers found.