🤖 AI Summary
Existing contrastive reinforcement learning methods struggle to effectively model the abrupt dynamical changes in object manipulation caused by interactions such as contact and grasping. This work proposes modeling manipulation dynamics as a piecewise-smooth Markov process and introduces an Interaction-Weighted Resampling (IWR) mechanism that prioritizes resampling during critical interaction phases to preserve mode boundaries essential for future reachability. For the first time, this approach explicitly links interaction-induced dynamical switches with reachability structure, establishing an interaction-aware contrastive reinforcement learning framework. In simulation, the method achieves an average performance improvement of 19.8%; on a real-world robotic air-hockey task, it increases goal-striking success rate from 25% to 60%, marking the first demonstration of high-success goal-conditioned control for this task.
📝 Abstract
Contrastive Reinforcement Learning (CRL) has seen recent success in a wide variety of goal-conditioned robotics tasks by learning structured representations of the dynamics. However, despite its success in locomotion and simpler control domains, CRL often struggles in interaction-rich manipulation. We argue that a key source of this difficulty is object-centric interaction, such as contact or grasping, that induces distinct changes in the underlying dynamic modes. In this work, we formulate manipulation dynamics as a piecewise-smooth Markov process and show that interaction-induced mode changes create piecewise nonlinear reachability structures that are difficult for standard CRL energy functions to represent and plan over. Based on this analysis, we introduce Interaction-weighted Resampling (IWR). IWR performs interaction-aware resampling around phases before, during, and after interactions, encouraging the learned representation to preserve the mode boundaries that determine future reachability to capture multi-modal and piecewise nonlinear reachability. Across interaction-centric environments, including 2D dynamic control, robotic manipulation, and robot air hockey, IWR improves both sample efficiency and overall performance over prior CRL methods, with 19.8% average improvement in simulation. Finally, using a sim-to-real pipeline with policies trained by IWR, we demonstrate the first real-world goal-conditioned robot air hockey agent capable of hitting goals, improving success from 25% to 60%. Project Page: IWR-arxiv.github.io.