ANT: Adaptive Neural Temporal-Aware Text-to-Motion Model

📅 2025-06-03

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the temporal misalignment between static semantic conditioning and dynamic denoising in text-to-motion generation: coarse-grained structural semantics are required early for skeletal motion modeling, while fine-grained local semantics are needed later for precise textual detail alignment. To resolve this, we propose a biologically inspired, epigenetic-regulation–motivated *stage-wise semantic adaptation mechanism*. Specifically, we introduce (i) a novel *temporal-aware semantic granularity partitioning module* based on spectral analysis; (ii) a *dynamic Classifier-Free Guidance scheduling strategy*; and (iii) a *spatiotemporal-coordinated quantitative semantic reweighting method*. To our knowledge, this is the first framework to systematically integrate developmental biology principles—particularly epigenetic regulation—into diffusion-based motion generation. Evaluated on benchmarks including StableMoFusion, our method significantly improves text-motion semantic alignment accuracy and achieves state-of-the-art performance.

Technology Category

Application Category

📝 Abstract

While diffusion models advance text-to-motion generation, their static semantic conditioning ignores temporal-frequency demands: early denoising requires structural semantics for motion foundations while later stages need localized details for text alignment. This mismatch mirrors biological morphogenesis where developmental phases demand distinct genetic programs. Inspired by epigenetic regulation governing morphological specialization, we propose **(ANT)**, an **A**daptive **N**eural **T**emporal-Aware architecture. ANT orchestrates semantic granularity through: **(i) Semantic Temporally Adaptive (STA) Module:** Automatically partitions denoising into low-frequency structural planning and high-frequency refinement via spectral analysis. **(ii) Dynamic Classifier-Free Guidance scheduling (DCFG):** Adaptively adjusts conditional to unconditional ratio enhancing efficiency while maintaining fidelity. **(iii) Temporal-semantic reweighting:** Quantitatively aligns text influence with phase requirements. Extensive experiments show that ANT can be applied to various baselines, significantly improving model performance, and achieving state-of-the-art semantic alignment on StableMoFusion.

Problem

Research questions and friction points this paper is trying to address.

Mismatch between static semantic conditioning and temporal-frequency demands in text-to-motion generation

Need for adaptive semantic granularity during different denoising stages

Improving semantic alignment and efficiency in diffusion-based motion generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic Temporally Adaptive Module for denoising

Dynamic Classifier-Free Guidance scheduling

Temporal-semantic reweighting aligns text influence

🔎 Similar Papers

No similar papers found.

Authors to Follow