🤖 AI Summary
Existing robot learning policies struggle to generalize across novel environments, tasks, and heterogeneous platforms. This work proposes a world-task factorization framework that decouples policy representations into an intent-agnostic “world factor” and a task-logic-defined “task factor,” formalizing their asymmetry through Bayesian model evidence. The architecture employs gradients as an interface, integrating differentiable recursive estimation graphs with compact policy representations to enable efficient fusion of analytical world models and task-specific cost signals. Evaluated across diverse robots, environments, and multimodal tasks, the approach significantly outperforms end-to-end and heuristic baselines, demonstrating zero-shot generalization and cross-platform transfer capabilities. The method has also been successfully deployed on real robotic hardware.
📝 Abstract
Robot learning must produce policies that generalize to new combinations of constraints, teammates, and environments. To achieve this, we must structurally factor the policy, which is a choice that dictates what generalizes, what requires retraining, and what remains entangled. Existing methods span a wide spectrum, from expecting structure to emerge from data scaling, to hand-designing it via hierarchies, skill libraries or learned specializations. In this paper, we study what we argue is the most fundamental factorization in robotics: separating the world from the task. We investigate the conditions under which this factorization is principled. World factors are properties of the embodied system and the environment; they exist independently of intent. Task factors are defined by the task's logic over what the world admits. We formalize this asymmetry through Bayesian model evidence: it aligns with the data-generating process, maintains high likelihood through an analytical world model, and reduces the Occam razor's penalty on task parameters. We instantiate this factorization by pairing AICON, a differentiable graph of recursive estimators and interconnections that is compositional, operates without task-specific data, and propagates cost gradients to actuators, with a compact, learned policy that modulates gradient paths. Gradients serve as the interface between the two factors: they carry world structure through the graph and task structure through costs, enabling low-dimensional learning while preserving structural generalization. We test the world/task factorization across three problems that encompass heterogeneous robots, environments, task logic and sensorimotor modalities. Our framework outperforms end-to-end baselines and analytical heuristics in all settings, generalizes zero-shot to out-of-distribution configurations, and transfers to real hardware without retraining.