ANT: Adaptive Neural Temporal-Aware Text-to-Motion Model

📅 2025-06-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the temporal misalignment between static semantic conditioning and dynamic denoising in text-to-motion generation: coarse-grained structural semantics are required early for skeletal motion modeling, while fine-grained local semantics are needed later for precise textual detail alignment. To resolve this, we propose a biologically inspired, epigenetic-regulation–motivated *stage-wise semantic adaptation mechanism*. Specifically, we introduce (i) a novel *temporal-aware semantic granularity partitioning module* based on spectral analysis; (ii) a *dynamic Classifier-Free Guidance scheduling strategy*; and (iii) a *spatiotemporal-coordinated quantitative semantic reweighting method*. To our knowledge, this is the first framework to systematically integrate developmental biology principles—particularly epigenetic regulation—into diffusion-based motion generation. Evaluated on benchmarks including StableMoFusion, our method significantly improves text-motion semantic alignment accuracy and achieves state-of-the-art performance.

Technology Category

Application Category

📝 Abstract
While diffusion models advance text-to-motion generation, their static semantic conditioning ignores temporal-frequency demands: early denoising requires structural semantics for motion foundations while later stages need localized details for text alignment. This mismatch mirrors biological morphogenesis where developmental phases demand distinct genetic programs. Inspired by epigenetic regulation governing morphological specialization, we propose **(ANT)**, an **A**daptive **N**eural **T**emporal-Aware architecture. ANT orchestrates semantic granularity through: **(i) Semantic Temporally Adaptive (STA) Module:** Automatically partitions denoising into low-frequency structural planning and high-frequency refinement via spectral analysis. **(ii) Dynamic Classifier-Free Guidance scheduling (DCFG):** Adaptively adjusts conditional to unconditional ratio enhancing efficiency while maintaining fidelity. **(iii) Temporal-semantic reweighting:** Quantitatively aligns text influence with phase requirements. Extensive experiments show that ANT can be applied to various baselines, significantly improving model performance, and achieving state-of-the-art semantic alignment on StableMoFusion.
Problem

Research questions and friction points this paper is trying to address.

Mismatch between static semantic conditioning and temporal-frequency demands in text-to-motion generation
Need for adaptive semantic granularity during different denoising stages
Improving semantic alignment and efficiency in diffusion-based motion generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic Temporally Adaptive Module for denoising
Dynamic Classifier-Free Guidance scheduling
Temporal-semantic reweighting aligns text influence
🔎 Similar Papers
No similar papers found.
Wenshuo Chen
Wenshuo Chen
Shandong University undergraduate student
Generative ModelsXAI
K
Kuimou Yu
HKUST(GZ), Guangzhou, China
H
Haozhe Jia
HKUST(GZ), Guangzhou, China
K
Kaishen Yuan
HKUST(GZ), Guangzhou, China
Bowen Tian
Bowen Tian
The Hong Kong University of Science and Technology (Guangzhou)
Model FusionNeural Network FunctionalsSemi-Supervised Learning
Songning Lai
Songning Lai
HKUST(GZ)
Machine LearningDeep LearningMultimodalXAI
Hongru Xiao
Hongru Xiao
Tongji university
LLMsALMsSpeech
E
Erhang Zhang
Shandong University, Qingdao, China
L
Lei Wang
Australian National University & Data61/CSIRO, Canberra, Australia
Y
Yutao Yue
Thrust of Artificial Intelligence and Thrust of Intelligent Transportation; HKUST(GZ), Guangzhou, China