HORIZON: Recoverability-Governed Curriculum for Physical-Domain Scaling

📅 2026-06-03
📈 Citations: 0
Influential: 0
📄 PDF

career value

230K/year
🤖 AI Summary
This work addresses the challenge in physical-domain reinforcement learning where naively expanding environmental randomization often leads to policy collapse, hindering robustness and generalization. The authors propose a curriculum learning framework grounded in recoverability constraints, which progressively expands the physical domain through a checkpoint-frontier curriculum. New, more challenging dynamics are introduced only within regions where the policy remains recoverable, guided by rollback and boundary-refinement mechanisms to ensure structured domain expansion. This study pioneers the use of recoverability as the core principle for domain expansion, revealing the inherent non-uniformity and non-monotonicity of physical-domain augmentation and underscoring the irreplaceable role of online curriculum design. Evaluated on quadrupedal locomotion tasks in simulation, the method significantly outperforms broad-domain randomization and offline expert distillation, achieving markedly enhanced robustness to complex physical perturbations.
📝 Abstract
Scaling robust robot policies requires more than broader randomization, because physical-domain experience must remain organized and learnable throughout training. We study when a policy can benefit from harder physics and identify recoverability as a central constraint in on-policy physical-domain scaling. In on-policy training, new dynamics are useful only insofar as they remain close enough to the current policy to generate corrective on-policy data, rather than collapsing rollouts into unrecoverable failures. Using quadruped locomotion as a physically demanding benchmark for embodied generalization, we introduce HORIZON, a checkpointed frontier curriculum that expands physical domains only within the current policy's recoverable boundary. HORIZON uses rollback and boundary refinement to govern each expansion step, turning fixed randomization into a continual process of physical-domain growth. Experiments reveal three regularities of physical-domain expansion. First, direct domain widening is uneven across physical axes and often unlearnable without staged ordering. Second, domain composition is non-monotonic, and adding more domains beyond a compact core can dilute recoverable joint samples and reduce overall robustness. Third, offline distillation of isolated experts cannot substitute for the joint interaction generated by on-policy curriculum. Together, these results frame physical-domain generalization as a continual growth problem for embodied control, with recoverability as the organizing principle for on-policy expansion.
Problem

Research questions and friction points this paper is trying to address.

recoverability
physical-domain scaling
on-policy training
embodied generalization
curriculum learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

recoverability
curriculum learning
physical-domain scaling
on-policy training
embodied generalization
🔎 Similar Papers