OMG: Omni-Modal Motion Generation for Generalist Humanoid Control

๐Ÿ“… 2026-06-08
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing approaches to humanoid robot control often suffer from limited skill repertoires or difficulties in accommodating multimodal inputs, lacking both generality and scalability. This work proposes a brain-inspired hierarchical control framework that integrates a reactive cerebellar module with an extensible cortical module capable of processing multimodal conditional inputsโ€”including language, audio, and human motion demonstrations. Leveraging a diffusion-based motion generation backbone and supported by a high-quality, large-scale data collection and annotation pipeline, the system enables multimodal-driven full-body motor control. The method achieves state-of-the-art performance across multiple tasks and demonstrates strong model scaling properties as well as rapid adaptability to novel modalities and action distributions.
๐Ÿ“ Abstract
Humanoid whole-body control has made significant progress in recent years, yet existing approaches remain limited to few-skill policies with heavy reward engineering, or motion trackers that are difficult to extend to new input modalities. We argue that the key to general-purpose humanoid control is to build a scalable brain, a module capable of reasoning with diverse conditioning modalities, atop a reactive motion tracking cerebellum, mirroring the hierarchical structure of biological motor systems. Two challenges arise in realizing this vision: acquiring a vast amount of high-quality data to achieve general purpose control, and equipping the generator with the capability to condition on compositional, extensible multi-modal inputs. We present OMG, which addresses these challenges with a meticulous data curation, filtering and labeling pipeline, as well as a diffusion-based motion generation backbone that conditions on language, audio, and human reference motions. Extensive experiments validate OMG as an omni-modal whole-body controller exhibiting state-of-the-art performance, model scaling behavior and efficient adaptation to new distributions and modalities, marking a concrete step toward foundation models for humanoid robots.
Problem

Research questions and friction points this paper is trying to address.

humanoid control
multi-modal input
general-purpose control
motion generation
whole-body control
Innovation

Methods, ideas, or system contributions that make the work stand out.

omni-modal motion generation
diffusion-based motion generation
generalist humanoid control
multi-modal conditioning
foundation models for robotics
๐Ÿ”Ž Similar Papers
2024-05-28International Conference on Learning RepresentationsCitations: 10