🤖 AI Summary
This work addresses the temporal misalignment between static semantic conditioning and dynamic denoising in text-to-motion generation: coarse-grained structural semantics are required early for skeletal motion modeling, while fine-grained local semantics are needed later for precise textual detail alignment. To resolve this, we propose a biologically inspired, epigenetic-regulation–motivated *stage-wise semantic adaptation mechanism*. Specifically, we introduce (i) a novel *temporal-aware semantic granularity partitioning module* based on spectral analysis; (ii) a *dynamic Classifier-Free Guidance scheduling strategy*; and (iii) a *spatiotemporal-coordinated quantitative semantic reweighting method*. To our knowledge, this is the first framework to systematically integrate developmental biology principles—particularly epigenetic regulation—into diffusion-based motion generation. Evaluated on benchmarks including StableMoFusion, our method significantly improves text-motion semantic alignment accuracy and achieves state-of-the-art performance.
📝 Abstract
While diffusion models advance text-to-motion generation, their static semantic conditioning ignores temporal-frequency demands: early denoising requires structural semantics for motion foundations while later stages need localized details for text alignment. This mismatch mirrors biological morphogenesis where developmental phases demand distinct genetic programs. Inspired by epigenetic regulation governing morphological specialization, we propose **(ANT)**, an **A**daptive **N**eural **T**emporal-Aware architecture. ANT orchestrates semantic granularity through: **(i) Semantic Temporally Adaptive (STA) Module:** Automatically partitions denoising into low-frequency structural planning and high-frequency refinement via spectral analysis. **(ii) Dynamic Classifier-Free Guidance scheduling (DCFG):** Adaptively adjusts conditional to unconditional ratio enhancing efficiency while maintaining fidelity. **(iii) Temporal-semantic reweighting:** Quantitatively aligns text influence with phase requirements. Extensive experiments show that ANT can be applied to various baselines, significantly improving model performance, and achieving state-of-the-art semantic alignment on StableMoFusion.