๐ค AI Summary
Existing approaches fail to adequately integrate force signals for modeling contact interaction dynamics and enabling robust feedback correction, limiting their performance in high-precision, contact-intensive manipulation tasks. This work proposes FAWAMโa Force-Aware World Action Modelโthat, for the first time, deeply integrates six-dimensional force/torque information into the action generation and prediction pipeline of a world model. FAWAM modulates action generation via historical force encoding, jointly predicts future actions and end-effector wrenches, and incorporates a residual correction module based on predicted wrench trajectories to enable real-time force-guided optimization within a perception-prediction-execution loop. Evaluated across diverse real-world contact-rich tasks, FAWAM achieves an average success rate improvement of 36.25% over vision-only baselines and 21.25% over current force-aware methods.
๐ Abstract
Force signals provide critical interaction cues for contact-rich robotic manipulation. However, existing methods mostly use force as an additional observation modality, without fully exploiting its role in modeling future interaction dynamics or guiding execution-time feedback correction. In this paper, we propose FAWAM, a force-aware world action model that incorporates force information at three levels: perception, prediction, and closed-loop execution. FAWAM first encodes historical 6-axis force/torque signals to modulate action generation, then jointly predicts future actions and end-effector wrenches to explicitly model contact evolution. It further introduces a residual correction module that uses the predicted wrench trajectory as an execution-time reference to refine actions online based on real-time force feedback. Real-world experiments across multiple contact-rich tasks show that FAWAM improves the average success rate by 36.25% over vision-only baselines and 21.25% over existing force-aware baselines, demonstrating the effectiveness of our force-aware framework for robust contact-rich manipulation.