Dynamics Are Learned, Not Told: Semi-Supervised Discovery of Latent Dynamics Geometries For Zero-Shot Policy Adaptation

📅 2026-06-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

206K/year
🤖 AI Summary
This work addresses the limited generalization of reinforcement learning policies under unmodeled or time-varying dynamics by proposing a trajectory-outcome-driven implicit dynamics representation that eschews reliance on predefined physical parameters. A task-specific smooth latent space is constructed via semi-supervised contrastive learning, and the authors theoretically establish a monotonic relationship between the regret bound in the target domain and the Lipschitz constant of the trajectory encoder. Leveraging this insight, they enforce Lipschitz constraints to optimize the geometry of the latent space, thereby enhancing robustness. Experiments on MuJoCo benchmarks demonstrate that the proposed method substantially outperforms parameter-centric baselines, effectively handling complex dynamics shifts while improving in-domain stability and interpretability of the latent representation.
📝 Abstract
Real-world dynamics shifts pose a critical challenge for reinforcement learning in robotics, as policies tightly coupled to nominal environments often fail catastrophically when physical conditions change. Most existing methods rely on encoding explicitly identified physical parameters into a latent context, a parameter-centric paradigm that depends on pre-specified axes of variation and becomes brittle under unmodeled or compound dynamics changes. We revisit dynamics adaptation from an outcome-centric perspective: rather than telling policies what the dynamics are, we enable them to learn how dynamics affect interaction outcomes. Theoretically, this is grounded in a monotonic relationship between target-domain regret and the Lipschitz constant of a trajectory dynamics encoder. Practically, this constant can be upper-bounded through contrastive learning, yielding a smooth, task-relevant latent topology without privileged dynamics information. On MuJoCo benchmarks, our method consistently outperforms parameter-centric baselines under severe dynamics shifts, including unmodeled and time-varying parameters, while also improving in-distribution stability and latent interpretability. Overall, these results validate that controlling latent geometry is a principled mechanism for robust adaptation.
Problem

Research questions and friction points this paper is trying to address.

dynamics shifts
reinforcement learning
zero-shot adaptation
latent dynamics
robotics
Innovation

Methods, ideas, or system contributions that make the work stand out.

latent dynamics geometry
zero-shot policy adaptation
contrastive learning
Lipschitz continuity
outcome-centric representation
Zhiming Xu
Zhiming Xu
University of Virginia
llm inferencemachine learning system
Weitao Zhou
Weitao Zhou
Tsinghua University
Autonomous DrivingReinforcement Learning
X
Xianghui Pan
Department of Electronics and Information Engineering, Tongji University, Shanghai, China
N
Nanshan Deng
School of Vehicle and Mobility, Tsinghua University, Beijing, China; Rimbot, Beijing, China
C
Chengju Liu
Department of Electronics and Information Engineering, Tongji University, Shanghai, China
Q
Qijun Chen
Department of Electronics and Information Engineering, Tongji University, Shanghai, China
C
Chenpeng Yao
Department of Electronics and Information Engineering, Tongji University, Shanghai, China