🤖 AI Summary
This work addresses the risk of toxic motion generation in text-to-motion synthesis by introducing the novel task of “human motion unlearning.” To systematically evaluate this task, we construct the first motion unlearning benchmark. We further propose Latent Code Replacement (LCR), a training-free method that adapts image unlearning techniques to the discrete spatiotemporal latent space of diffusion models—marking the first such adaptation for motion generation. LCR performs semantic-aware latent code remapping to precisely eliminate both explicit and implicit toxic motions. Extensive qualitative and quantitative evaluations on HumanML3D and Motion-X demonstrate that LCR significantly outperforms existing baselines while preserving overall generation quality and diversity. Our approach establishes a scalable, fine-tuning-free paradigm for safe and controllable motion generation.
📝 Abstract
We introduce the task of human motion unlearning to prevent the synthesis of toxic animations while preserving the general text-to-motion generative performance. Unlearning toxic motions is challenging as those can be generated from explicit text prompts and from implicit toxic combinations of safe motions (e.g., ``kicking"is ``loading and swinging a leg"). We propose the first motion unlearning benchmark by filtering toxic motions from the large and recent text-to-motion datasets of HumanML3D and Motion-X. We propose baselines, by adapting state-of-the-art image unlearning techniques to process spatio-temporal signals. Finally, we propose a novel motion unlearning model based on Latent Code Replacement, which we dub LCR. LCR is training-free and suitable to the discrete latent spaces of state-of-the-art text-to-motion diffusion models. LCR is simple and consistently outperforms baselines qualitatively and quantitatively. Project page: href{https://www.pinlab.org/hmu}{https://www.pinlab.org/hmu}.