π€ AI Summary
This work addresses the challenge of autonomous robotic shaping in granular media (e.g., sand), where high-dimensional configuration spaces and complex, non-rigid dynamics hinder reliable control. We propose a vision-based reinforcement learning framework tailored for physical shaping tasks. Our method integrates stereo visual perception with a compact state representation, employs a sparse, task-oriented reward function designed for granular dynamics, and trains a robot policy using deep RL with a cubic end-effector. Deployment leverages simulation pretraining followed by real-world fine-tuning. Key contributions include: (i) a lightweight observation space mitigating the curse of dimensionality inherent in granular systems; (ii) a minimal yet effective reward formulation aligned with non-rigid medium dynamics; and (iii) the first demonstration of end-to-end vision-guided closed-loop control for shaping in real sand. Experiments on both simulation and physical platforms show significant improvements over two baselines, validating the approachβs accuracy and robustness in structural shaping.
π Abstract
Autonomous manipulation of granular media, such as sand, is crucial for applications in construction, excavation, and additive manufacturing. However, shaping granular materials presents unique challenges due to their high-dimensional configuration space and complex dynamics, where traditional rule-based approaches struggle without extensive engineering efforts. Reinforcement learning (RL) offers a promising alternative by enabling agents to learn adaptive manipulation strategies through trial and error. In this work, we present an RL framework that enables a robotic arm with a cubic end-effector and a stereo camera to shape granular media into desired target structures. We show the importance of compact observations and concise reward formulations for the large configuration space, validating our design choices with an ablation study. Our results demonstrate the effectiveness of the proposed approach for the training of visual policies that manipulate granular media including their real-world deployment, outperforming two baseline approaches.